knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(tntpmetrics)
library(dplyr)

TNTP uses common metrics to learn from the work project teams are doing. By using similar metrics across different projects, TNTP teams are better able to track their progress reliably and coordinate work across contexts. Common metrics also serve as the core of organization-wide goals. Though nearly all projects are using common metrics somewhere in their work, collecting common data does not guarantee each project will score or aggregate the metrics similarly. And despite scoring guidance, using valuable analyst time to walk through the steps to calculate metrics, or score teams' goals is not ideal.

The tntpmetrics package includes three handy functions that start with raw, project-collected data and calculate individual common metric scores, summarize average common metrics scores for an entire project, compare common metric scores between (typically student sub-) groups, and analyze changes in metric scores over time. Most of the work of these functions is checking for potential errors or inconsistencies in the data -- such as data not being on the proper scale, or missing but needed variables. These functions attempt to anticipate all of the potential issues that could occur between collecting raw data and calculating simple means from it. This document shows you how to use these three functions.

Current available metrics

Currently, tntpmetrics has functions to calculate and work with the following common metrics:

Practice Data Sets: ss_data_initial, ss_data_final, and ipg_data

To demonstrate how to apply the common metric functions in tntpmetrics, we will use two sets of fake student survey data. The data contains 1,000 student survey responses from 26 classes at the beginning of a project (ss_data_initial.rda) and another 1,000 student survey responses from the same 26 classes at the end of the project (ss_data_final.rda).

This data automatically comes with the tntpmetrics package. Both data sets have the same variable/column names, which include a value for each survey question from the Engagement, Relevance, and Belonging constructs. Specifically, these metrics are based on the following survey items:

These surveys items take on values of 0 (for Not True), 1 (for A Little True), 2 (for Mostly True), or 3 (for Very True). Also in the data is a class ID, and two demographic categorical character variables associated with each class.

head(ss_data_initial)

Because of the additional needed variables to use these functions with IPG data, we will also practice with the fake observation data ipg_data.rda. This data automatically comes with the tntpmetrics package. The data contains 100 observations using the IPG. All four subjects are represented.

Calculating Common Metrics

tntpmetrics contains a simple function called make_metric to attach a new column/variable to your data with the value of the scored common metric. The new column will always have the prefix cm_ followed by the name of the metric. For example, the engagement metric is simply the sum of the four engagement survey items. To use make_metric, simply provide the data set you want to use, and the metric you want calculated, making sure to put the latter in quotes. The result is your data but with new variable cm_engagement. The function also tells you how many rows of data did not have a construct created because at least one of the survey items was missing.

make_metric(data = ss_data_initial, metric = "engagement") %>%
  select(response_id, starts_with("eng_"), starts_with("cm_")) %>%
  head()

Binary version of common metric

Note that in the above, there were two new variables created: cm_engagement and cm_binary_engagement. For many common metrics, there is a cut-point on the metric scale above which, scores take on a special meaning. For engagement, for example, scores of 8 or above imply that the student in that particular response was "engaged". The variable cm_binary_engagement will be TRUE if the engagement score is above this cut-point and FALSE if not. For most common metrics, the guidance is to set goals around the actual metric score, not the binary version, as the binary version reduces the nuance of the data. However, we know teams are interested in these binary classifications to cm_binary_ variables are always created when you run make_metric as long as the metric has a defined cut-point. (The metric tntpcore does not have a defined cut-point.)

Checking for common errors

make_metric automatically checks for the most common data issues.

Misspelled Variables

First, it requires that the data have the variable names spelled exactly as above. There is nothing special about these variable names, and the function had to choose some as the default. If your data has the variable names spelled differently, then you'll have to change them before using make_metric. Otherwise, you'll get an error:

ss_data_initial %>%
  rename(eng_interest_wrongspelling = eng_interest) %>%
  make_metric(metric = "engagement")

Which variable names are needed for each metric can always be found by typing ? make_metric; they are also detailed later in this vignette.

Items on the wrong scale

Second, make_metric will check each item to ensure it's on the expected scale. For student survey items, it expects the scale of 0-3 outlined above. If any data value is outside of this scale, you'll get an error telling you which variables are out of scale and the proper scale on which they should be:

ss_data_initial %>%
  mutate(
    eng_interest = eng_interest + 1,
    eng_like = eng_like - 1
  ) %>%
  make_metric(metric = "engagement")

You will also get an error if your scales are not numeric:

ss_data_initial %>%
  mutate(
    eng_interest = case_when(
        eng_like == 0 ~ "Not True",
        eng_like == 1 ~ "A Little True",
        eng_like == 2 ~ "Mostly True",
        eng_like == 3 ~ "Very True"
    )
  ) %>%
  make_metric(metric = "engagement")

The scales needed for each metric are detailed later in this vignette.

(Optional) Censored scale use

There are times where items may be on the wrong scale, but in a way that is undetectable. For example, what if the student survey data was provided to you with each item on a scale of 1-4, but because students never responded "Very True", the data only actually has values of 1-3. Values of 1-3 are all in scale for student surveys, so that the preceding error will not occur. To account for this, make_metric automatically checks that each possible value on the scale is used and gives you a warning if that is not the case by indicating the affected variables and which value(s) they did not use:

ss_data_initial %>%
  mutate(eng_interest = ifelse(eng_interest == 0, NA, eng_interest)) %>%
  make_metric(metric = "engagement") %>%
  head()

Because this is not technically an error, you can turn off this default warning by setting scaleusewarning = F:

ss_data_initial %>%
  mutate(eng_interest = ifelse(eng_interest == 0, NA, eng_interest)) %>%
  make_metric(metric = "engagement", scaleusewarning = F) %>%
  head()

Required column names and scales for each common metric

Below are the required column names and associated scales for each metric. See The Goals Guidance Hub for more details. Note that these are the names of the columns needed in your data. It doesn't mean that every row must have a value on each of these variables. It can be okay if some of these variables have NA values for specific rows. For example, K-5 Literacy observations on the IPG require either all of the Core Actions (ca1_a, ca1_b, ca1_c, ca2_overall, ca3_overall) and/or rfs_overall. If an observation has all the core actions it still needs a variable called rfs_overall, but the value can be NA.

A note about the expectations metric

The items used to measure expectations shifted from a collection of six, mostly reverse-coded worded items to four positively worded items. Both expectations metrics are available, with the current 4-item expectations metric known as "expectations" and the older 6-item expectations metric known as "expectations_old". That is, to use the older 6-item expectations metric in this package set metric = expectations_old and to use the current 4-item expectations metric set metric = expectations.

A note about the IPG

The IPG has the most complicated scoring approach of any of the common metrics. That is because it was originally intended as a diagnostic and development (rather than evaluation) tool, has different components based on the subject matter, has indicators on different scales, and can often have layers of skip logic in the online form used to capture the data. Nevertheless, make_metric works just as easily on the IPG as it does on other metrics, but users should be aware of three things:

Because of the somewhat increased complexity, there are additional IPG examples at the end of this vignette.

Goals Analysis

In most cases, making the common metric is just an intermediate step to scoring the metric for goals purposes. tntpmetric has two functions that should make all the necessary goals calculations for you. In both cases, you do not need to create the metric ahead of time. Just provide the function your raw data, indicate the metric of interest and type of analysis needed.

Calculating the average common metric score

If you want to calculate the average common metric score at a single point in time, you can use the function metric_mean. For example, to calculate the average Sense of Belonging score in the initial survey data, simply give it your data and indicate it's the "belonging" metric. (The by_class option will be discussed below)

metric_mean(ss_data_initial, metric = "belonging", by_class = T)

metric_means estimates this mean using a multilevel model framework, and takes advantage of the R package emmeans to print the output. The overall mean is displayed in the first element of the returned list under emmean. For a more robust result, you are also provided the appropriate Standard Error (SE) and the lower and upper bounds of the 95% Confidence Interval (lower.CL and upper.cl)

Using the binary version of the variable

The function metric_mean also works on the binary version of the construct. Simply set the option use_binary to TRUE:

metric_mean(ss_data_initial, metric = "engagement", use_binary = T, by_class = T)

Because the outcome is not a TRUE/FALSE binary, the mean will always be a proportion between 0 and 1. In the above example, the value 0.238 implies that 23.8% of responses in this data set were "engaged".

Calculating the average common metric score for different groups

Many projects have equity-based goals that require looking at mean common metric scores for different types of classrooms. For example, the student survey data has a variable class_frl_cat indicating whether the response comes from a class with at least 50% of students receiving free or reduced price lunch or a class where fewer than 50% of students receive FRL. To look at the results for each group, simply include the column name as the equity_group:

metric_mean(ss_data_initial, metric = "belonging", equity_group = "class_frl_cat", by_class = T)

Now, the results show the mean for both types of classes, and include another entry to the returned list called "Difference(s) between groups" the calculates the contrast, or the difference between these group means, and gives a standard error and p-value in case it's of interest. Note that the contrast is always represented as the first group listed minus the second group listed. In this case, because the reported difference is negative, it means that classes with under 50% FRL students tended to have a higher sense of belonging score.

Equity group comparisons work even when there are more than two group values, like in the variable class_soc_cat:

metric_mean(ss_data_initial, metric = "belonging", equity_group = "class_soc_cat", by_class = T)

Because it's rare for projects to set equity goals for factors that have many different groups, metric_mean warns you if your equity_group variable has more than 5 categories; usually that means something is wrong with your variable.

The by_class option

Some metrics collect multiple data points from a single class. For example, student surveys will survey multiple students in the same class, and in many cases multiple times. Because different classes will almost surely have a different number of associated data points -- some classes might get 10 surveys, while another might get 50 -- we need an approach that doesn't over- or under-represent some classes because of differences in sample sizes. Fortunately, the multilevel models under-girding the the functions in tntpmetrics account for differences in sample sizes between classes automatically. But to make them work, you must have a variable in your data titled class_id representing each classroom's unique identifier. You must also set by_class = T as we did in the above examples.

If you do not set by_class = T and/or you do not have a class_id variable, metric_mean will not account for differences in sample sizes by class. In cases where you have multiple rows of data associated with the same class, not accounting for class IDs is statistically inappropriate and the standard errors and confidence intervals will likely be too small. Because some projects will surely forget to collect a class ID, metric_means will still give you the results even if you set by_class = F (or do not specify this option, as FALSE is the default), but will warn you about this statistical issue if you are using a metric that is expecting a class ID, like student surveys or assignments:

metric_mean(ss_data_initial, metric = "belonging")

You will not get this warning if you set by_class = F and you are analyzing a metric that is less likely to have multiple responses per class, like expectations or observations.

Calculating average growth over time

To examine how the average metric score has changed between two time points, use the function metric_growth. This function works the same as metric_mean but expects you to provide two data sets: one for the first time point (data1) and one for the later time point (data2). For example, to look at how engagement has changed over time, we can use:

metric_growth(
  data1 = ss_data_initial, 
  data2 = ss_data_final, 
  metric = "engagement", 
  by_class = T
)

In this example, the mean engagement score initially was 4.93, but increased to 5.99 by the final data collection. This difference was a growth of 1.06 points.

Using the binary version of the variable

The function metric_growth also works on the binary version of the construct. Simply set the option use_binary to TRUE:

metric_growth(
  data1 = ss_data_initial, 
  data2 = ss_data_final, 
  metric = "engagement",
  use_binary = T,
  by_class = T
)

As before, remember that the values represent proportions between 0 and 1. In the example above, 23% of responses in the initial data were engaging and 33% were engaging in the final data. The difference (0.0922) represents about 9 percentage points.

Calculating differences in growth over time between equity groups

You can also examine how growth compared between different groups by specifying he equity group:

metric_growth(
  data1 = ss_data_initial, 
  data2 = ss_data_final, 
  metric = "engagement",
  equity_group = "class_frl_cat",
  by_class = T
)

In this example, classes with at least 50% of students receiving FRL had an initial engagement score of 2.93, and then grew to 4.02 at the final data collection. Classrooms with under 50% FRL students also grew, from 6.94 to 7.97. Adding this equity_group option will directly show how the difference between the two groups varied at each time point. In this case, classes with at least 50% FRL students had engagement scores that were 4.01 points lower than other classes initially, and 3.95 points lower at the final data collection. The difference of these differences (i.e., -3.95 - -4.01 = 0.0659) is shown in the list element "Change in differences between groups over time". In this case, this difference is small and not significantly different from 0 (the p-value is 0.48), implying that the gap between these types of classrooms did not change meaningfully over time.

You must have the same group definitions in both data sets, or you'll get an error:

# Renaming FRL class variable so it doesn't match initial data
ss_data_final_error <- ss_data_final %>%
  mutate(
    class_frl_cat = ifelse(
      class_frl_cat == "At least 50% FRL",
      ">= 50% FRL",
      class_frl_cat
    )
  )
metric_growth(
  data1 = ss_data_initial, 
  data2 = ss_data_final_error, 
  metric = "engagement",
  equity_group = "class_frl_cat",
  by_class = T
)

IPG Examples

Analyzing IPG data requires additional variables beyond those used directly in scoring the construct. This is because the IPG calculates the construct differently based on the subject and grade-level of the observation. Though some of these calculations are slightly more involved, you can use the functions in tntpmetrics in the same way as the examples above. You just need to make sure that each observation has all the necessary variables.

IPG data should be long and tidy

tntpmetrics expects your IPG data to be long and tidy: each row is a separate observation, and each column should represent just one thing. That is, you should only have one column titled ca2_overall (representing the overall score to Core Action 2); you should not have a separate variable for each subject (i.e., you should not have math2_overall, rlc2_overall, science2_overall, etc.), as this would not be "tidy" because that column now contains information on two things: subjects and core action 2 scores. Instead, you must have a separate column/variable called form in your data indicating the subject on which the IPG was applied. Depending on the project, this might require you to compile all of your IPG data from separate forms in WordPress/Formidable before using the tntpmetrics package.

Additional needed variables

When working with the IPG, all observations will need a variable called form which must be "Literacy", "Math", "Science", or "Social Studies". All observations will also need a variable called grade_level, which is numeric and can range from -1 to 12, where -1 is Pre-K and 0 is Kindergarten.

All literacy observations that take place in a Pre-K to 5th grade classroom must also have a variable called rfs_overall which represents the overall score on the reading foundation skills.

Science observations must have additional Core Action 1 domains ca1_d, ca1_e, and ca1_f. How these additional domains get scored depends on the value to a gatekeeper or filter question from the form: "Did the lesson focus on a text or inquiry and scientific practice (experimentation, application, modeling, analysis, etc.)? Note that “texts” in this context also means other scientifically appropriate mediums of communication, like grade-appropriate videos, data sets, models, etc.". Thus, all science observations must also have a variable called science_filter which must have values of "Text", "Inquiry and Scientific Practice", "Both", or "Neither".

Calculating observation scores

If you have all of the needed variables, you can create the overall observation score just as we did in the above examples:

ipg_data %>%
  make_metric(metric = "ipg") %>%
  select(ends_with("overall"), col, cm_ipg) %>%
  head()

In the above, notice that Core Actions 2 and 3, and Culture of Learning were rescaled in order to calculate the IPG construct (cm_ipg) but are returned to you in their original scale of 1-4. Additionally, the oveall Core Action 1 score is created as a new variable (ca1_overall) as only the subdomains are included in the original data.

But note what happens if you do not have the variable called form:

ipg_data %>%
  select(-form) %>%
  make_metric(metric = "ipg")

... Or if you don't have a variable called grade_level:

ipg_data %>%
  select(-grade_level) %>%
  make_metric(metric = "ipg")

... Or if either of these takes unexpected values:

ipg_data %>%
  mutate(form = ifelse(row_number() == 1, "Blarg", form)) %>% 
  make_metric(metric = "ipg")
ipg_data %>%
  mutate(grade_level = ifelse(row_number() == 1, 15, grade_level)) %>% 
  make_metric(metric = "ipg")

In fact, NAs are not allowed for grade-level or form because they determine how the construct is scored. You will get an error if any of your data rows have NAs in these columns:

ipg_data %>%
  mutate(form = ifelse(row_number() == 1, NA, form)) %>% 
  make_metric(metric = "ipg")

Missing ratings

tntpmetric will attempt to create an overall IPG score for every observation in your data. However, it can only do so if all the needed domain ratings are included. In cases where an observation is missing a required domain rating -- for example, an observation is missing an overall score on Core Action 2 or Culture of Learning -- it will return an NA for the overall IPG construct value and tell you how many observations were missing key information.

In the example below, we can see this warning if we purposefully drop some key domain ratings in the data before applying the make_metric function:

ipg_data %>%
  mutate(
    col = ifelse(row_number() == 1, NA, col),
    ca3_overall = ifelse(row_number() == 2, NA, ca3_overall),
    rfs_overall = ifelse(row_number() == 14, NA, rfs_overall),
    ca2_overall = ifelse(row_number() == 14, NA, ca2_overall)
  ) %>% 
  make_metric(metric = "ipg") %>%
  select(observation_number, ends_with("overall"), col, cm_ipg) %>%
  slice(1:5, 14)

Note that these messages are warnings, and the code will still run. But the affected observatinos now have NAs for their value for the overall construct (cm_ippg).

Science observation scores

The (at the time of this writing) new science IPG form has more Core Action 1 subdomain ratings to account for the variety of lessons science observers might see. tntpmetrics accounts for these scoring approaches, but you must include a variable called science_filter in your data to tell the function which Core Action 1 values to expect. Without this variable, you will get an error:

ipg_data %>%
  filter(form == "Science") %>%
  make_metric(metric = "ipg") %>%
  select(science_filter, starts_with("ca"), col, cm_ipg) %>%
  head()

ipg_data %>%
  filter(form == "Science") %>%
  select(-science_filter) %>%
  make_metric(metric = "ipg")

IPG goals

If you have all of the needed variables with proper, allowable values, then the functions in tntpmetrics to calculate means (overall or by group) or changes over time work identically to the student survey examples above. For instance:

ipg_data %>%
  metric_mean(metric = "ipg")

Questions?

Contact Adam Maier or Cassie Coddington with questions.



adamMaier/tntpmetrics documentation built on Feb. 1, 2022, 1:03 p.m.