In kamclean/collaborator: Scalable multi-centre research using R and REDCap

knitr::opts_chunk$set(collapse = FALSE)
library(collaborator); library(dplyr)

CollaboratoR: Group-specific emails (mailmerge in R)

In large-scale multicentre reseach, communication with data collectors in a meaningful way can be challenging. Often, group-specific emails (with group-specific attachments) can be desired (for example reports of missing data for a particular site). Yet there is a limited number of non-proprietary softwares that allow this to be automated at scale, and often this is required to be done on a manual basis.

CollaboratoR has several functions that have been designed to work together to faciliate the process of sending group-specific emails (including attachments). This has been developed with interoperability with REDCap in mind, but the "email_" functions do not require REDCap to work.

Why would you choose to do this over mailmerge or other equivalent software?

The main advantage of this method is the capability to attach group-specific attachments and to reproducibility email regular updates.

1. Build email dataset

The first step is to define the groups of people that will be emailed.

a). REDCap user export

For projects on REDCap, all users with access rights to each data access group (DAG) can be accessed via the API.

This can be done via the user_role() CollaboratoR function (see Redcap User Management: 1. Explore Current Users for more details), alongside several other functions from other packages ( redcapAPI / REDCapAPI / RCurl ).

df_user_all <- collaborator::user_role(redcap_project_uri = Sys.getenv("collaborator_test_uri"),
                                       redcap_project_token = Sys.getenv("collaborator_test_token"))$full

knitr::kable(head(df_user_all, n=10)) # Please note all names / emails have been randomly generated

However, these do not produce the correct data format as the data (email) must be summarised by group (data_access_group). This can be done directly via the user_summarise() function. This produces data in the exact format required by the subsequent "email_" functions (alongside some additional summarised data which may be of interest). This is the recommended option for handing REDCap user data for this purpose.

df_user <- collaborator::user_summarise(redcap_project_uri = Sys.getenv("collaborator_test_uri"),
                                        redcap_project_token = Sys.getenv("collaborator_test_token"),
                                        user_exclude = "y_o’doherty")

knitr::kable(df_user) # Please note all names have been randomly generated

b). Other sources

While these email functions were developed for the intent of integration with REDCap, the subsequent "email_" functions are build to not require REDCap to work. However, the same minimum input format is required:

One unique group per row (listed within a single column).
A string of group-specific email addresses, separated by a semicolon (listed within a single column).

For example (using the data above to illustrate):

df_user_other <- df_user %>%
  dplyr::select("group" = data_access_group, "group_specific_emails_recipients" = user_email)

  knitr::kable(df_user_other)

There may be any number of additional columns present that can be used to "mail-merge" within the email subject or body.

2. Build group-specific emails

At this stage, we have a dataframe of the grouped recipents of the emails. Now we can begin to build the group-specific components of the emails.

a). Generate email fields

The email_field() function wrangles the dataframe of grouped recipents (e.g. df_user above) into the format required by email_send(). There are two aspects of email fields that can be customised:

Recipients: We define who will recieve the email. Different columns of emails can be specified as the main recipents (recipient_main), cc'd (recipient_cc), or bcc'd (recipient_bcc). For example, you may want to specify the primary investigator for a site as the recipient_main and others involved at that site as recipient_cc.
Subject: We define what the name of the email will be. This can be the same for all emails, or the subject can be made group-specific by incorporting column names within the string. These names must be within square brackets e.g. "[colname]".

df_email <- collaborator::email_field(df_email = df_user,
                                      group = "data_access_group",
                                      recipient_main = NULL,
                                      recipient_cc = NULL,
                                      recipient_bcc = "user_email", # we want all the recipients to be "BCC". 
                                      subject = "Hello to [user_n] collaborators at [data_access_group]")

knitr::kable(df_email)

b). Add email body

The email_body() function will perform a mailmerge using a specified rmarkdown file (e.g. "vignette_email_body.Rmd") and the output from email_field(). This will form the email body to be sent via email_send().

The rmarkdown file must have the YAML specified as "output: html_document" and fields to be mailmerged must be specified as "x$colname" (see "vignettes/vignette_email_body.Rmd"). The text format / spacing/ etc can be edited as usual for html rmarkdown documents.
The output from email_body() is the original dataframe with up to 2 columns appended (based on the html_output parameter):
"file" - The path for group-specific html file produced (can be viewed to verify the rendered Rmd document displays as desired). These will only be saved if html_output includes "file". Note: All files will be placed in a subfolder specified by subfolder (default = "folder_html" within current working directory), and will be named according to the group name (this can be customised using file_prefix and file_suffix).
"code" - The html code for the mailmerged html document produced. Note: at present, GmailR does not allow allow html files to be directly attached as the email body. Therefore, html_output should always include "code" and this should be used as the email body in email_send().

df_email <- collaborator::email_body(df_email, 
                                     html_output = "code", 
                                     rmd_file = here::here("vignettes/vignette_email_body.Rmd"))

tibble::as_tibble(df_email)

c). Add email attachments

The group2csv() function will split a tibble/dataframe by "group" variable, then save grouped data in a subfolder as individual CSV files. This can be used as a group-specific attachment to be sent via email_send().

The groups in the tibble/dataframe supplied (data) must exactly match the groups within the email_field() function output.
The output from email_body() is a tibble with the "group":
"file" - The path for group-specific CSV files produced (not any data not in a group will be discarded). Note: All files will be placed in a subfolder specified by subfolder (default = "folder_csv" within current working directory), and will be named according to the group name (this can be customised using file_prefix and file_suffix).

# Generate patient-level / anonomysed missing data report
report <- collaborator::report_miss(redcap_project_uri = Sys.getenv("collaborator_test_uri"),
                                    redcap_project_token = Sys.getenv("collaborator_test_token"))$record


attach <- collaborator::group2csv(data = report,
                                  group = "redcap_data_access_group",
                                  subfolder = here::here("vignettes/folder_csv"), file_prefix = "missing_data_")

knitr::kable(attach)

The output from group2csv() and any other attachments must be appended to the output from email_field() as additional columns.

Files that are intended to be sent to all groups can be appended here in addition (any file type).

df_email <- df_email %>%
  dplyr::left_join(attach, group=c("redcap_data_access_group" = "group")) %>%

  dplyr::mutate(file2 = here::here("man/figures/collaborator_logo.png"))

Note: Only group2csv() will be supported as a function for creating group-specific files en mass (e.g. rather than pdf, word, etc) as these are too heterogeneous and specific in their purpose. However, once created via rmarkdown, their pathways can be joined to the correct group and will be attached if included in the email_send() function.

3. Send group-specific emails

We now have created our four components to the group-specific email:

Group (Essential): Column created via email_field() ("group").
Email fields (Essential): Columns created via email_field() ("recipient_main", "recipient_cc", "recipient_bcc", "subject") .
Email body (Essential): Column created via email_body() ("code").
Attachments (Optional): Column created via group2csv() (e.g. "file") or added manually (e.g. "file2").

a). Gmail connection set-up

The email_send() function is currently built for use with gmail via the gmailr package. To function, a connection to the Gmail API must be established in advance (find step by step process here).

Note: Rstudio hosted on virtual machines have difficulty in connecting to gmail. For sending emails, an Rstudio installed on a physical computer is needed at present.

gmailr::gm_auth_configure(path = "gmail.json")

b). Sending

The email_send() will allow automated sending of the prepared group-specific emails and their attachments. The following parameters can be specified:

sender: The email account from which the emails will be sent (must be a gmail account and match the )
body: The column containing the html code produced by the email_body() function for each group.
attach: An (optional) list of columns containing the paths of all files to be attached (including those which are group-specific). If zip = TRUE (default = FALSE), then these files will be compressed into a zip folder.
draft: To prevent premature sending of emails, the default setting is to send emails to the gmail draft folder (draft = FALSE). When ready to send emails automatically this should be changed to draft = TRUE. Note: gmail allows a maximum of 500 emails to be sent per day - if you plan to exceed this number then split the dataset and send on different days.

collaborator::email_send(df_email = df_email,
                         sender = "email@gmail.com", 
                         email_body = "code", 
                         attach = c("file", "file2"), zip = T,
                         draft = FALSE)