```{=html}
```r knitr::opts_chunk$set(echo = TRUE) library(etlTurtleNesting) library(drake) library(wastdr) library(pointblank) library(magrittr) library(reactable) library(gt) readd(odkc_ex) readd(user_mapping) loadd(odkc_ex) loadd(user_mapping)
a <- user_mapping %>% annotate_user_mapping() %>% pointblank::create_agent() %>% pointblank::col_vals_lt("dist", 0.01) %>% interrogate() actions_table <- a %>% pointblank::get_data_extracts() %>% magrittr::extract2(1) %>% dplyr::rowwise() %>% dplyr::mutate( active_at = get_user_area(odkc_ex, odkc_username), dist = round(dist,3) ) %>% dplyr::select(-email, -phone) %>% dplyr::arrange(active_at, -dist) %>% gt::gt() %>% gt::fmt_markdown(columns = gt::everything()) %>% gt::cols_label( odkc_username = gt::html("<h5>Cleaned</h5>\n<small>ODK Collect username</small>"), active_at = gt::html("<small>Active at</small>"), wastd_matched = gt::html("<h5>We matched</h5>\n<small>WAStD User</small>"), search_wastd = gt::html("<h5>Search</h5>\n<small>likely candidates</small>"), dist = gt::html("<h5>Dissimilarity</h5>\n<small>smaller is better</small>") ) %>% gt::tab_spanner( label = "WASTD User match", columns = c(odkc_username, active_at, wastd_matched, search_wastd, dist) ) %>% gt::tab_spanner( label = "WASTD User profile (chosen as best match)", columns = c(role, pk, username, name, nickname, aliases) ) %>% gt::tab_style( style = list( gt::cell_text(weight = "bold", color="red") ), locations = gt::cells_body( columns = dist, rows = dist > 0.1 ) ) %>% gt::tab_style( style = list( gt::cell_text(color="orange") ), locations = gt::cells_body( columns = dist, rows = dist < 0.1 ) )
Work through the issues directly in the table below.
Close matches (typos) have a dissimilarity distance of less than 0.1. For these there likely exists a WAStD user profile already, and their mis-spelled name can be added as an alias.
Likely mismatches have a distance of more than 0.1 and are highlighted red. For these we are likely missing a WAStD user profile.
Your actions:
Users who put team lists in username - please re-train ASAP:
If a "We matched WAStD User" is definitely not the person from the "ODK Collect username":
If a "We matched WAStD User" is the person from the "ODK Collect username" but has a bad match:
This happens e.g. when the ODK Username is a part (given name), an alias (abbreviation), or a team list.
Open the "We matched WAStD user" profile link and update the alias with the ODK username.
Next time we run the user matching, the match should come up with a lower dissimilarity measure (or even drop off the validation shortlist).
actions_table
r Sys.time()
.r odkc_ex$downloaded_on
.During the data import from ODK Central to WAStD, wastdr maps each distinct ODK Collect "username" as written by data collectors to actual WAStD user profiles.
The matching is done by
fuzzyjoin::stringdist_left_join
using the Jaro-Winker distance between the ODKC username and the WAStD
name
and aliases
. Aliases are split up by comma.
The Jaro-Winker distance was chosen as it returns the highest number of correct
matches.
Details about available distance measures can be found here.
You can download a spreadsheet of mismatches from the CSV button:
a
Note: updating WAStD will not refresh this report automatically - we have to re-run the data import.
When we re-run the user matching with fresh data, users with updated aliases should match up.
The QA validation will pick up all username matches with a dissimilarity of 0.01 or higher. Perfect matches will have a dissimilarity below 0.01.
Coordinators of data capture programs supply us with a spreadsheet in this exact format:
Column "name": The full name, including given and surname, e.g. "Florian Mayer"
Column "email": A valid email, e.g. Florian.Mayer@dbca.wa.gov.au
.
<
pointy
brackets>
.Column "phone": A valid phone number with the Australian prefix +61
.
We add the following columns:
florian_mayer
.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.