In RGLab/R2i: Tools for preparing an ImmPort Data Submission

knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
options(DT.options = list(paginate = FALSE, info = FALSE, filter = FALSE))

OVERVIEW

The purpose of this vignette is to demonstrate how to use the functions and data in the R2i package to create the basicStudyDesign.txt that must be submitted along with assay and other metaData to the public ImmPort database at NIH. When creating your own submission documents, you can create a similar vignette so that the whole process is reproducible.

PREPARING THE STUDY DATA FRAME

First, we will load the R2i library, which has the functions and templates we need.

library(R2i)

Next, we need to create the 11 dataframes that will be put together to form the larger txt:

study
study_categorization
study_2_condition_or_disease
arm_or_cohort
study_personnel
planned_visit
inclusion_exclusion
study_2_protocol
study_file
study_link
study_pubmed

The first dataframe, called study is unique in that it has two columns, one with names and the other with values. In this way it functions more like a named list and is treated differently from most other dataframes that have a typical format with column headers and unlimited rows allowed.

In order to get a pre-made dataframe for study we use the getTemplateDF() function.
The only argument is the name of the template.

study <- getTemplateDF("study")
DT::datatable(study, options = list(scrollX = TRUE))

Now that we have the study dataframe, the easiest way to edit the dataframe by hand is to use the edit() function and save the output by clicking "quit" in the editor. This function is found in the utils package, but depends on the X11 library. If you are on a newer mac you may need to install XQuartz from the project's website as it is no longer bundled. If for some reason edit() does not save the output correctly on your machine, then you will have to input information using an R-based approach.

edit(study)

An R-based approach example:

study[1, ] <- list(
  NA,
  "myBriefTitle",
  "myOfficialTitle",
  "This is a study",
  "This is a longer description of the study",
  "We hypothesize ... ",
  "The objectives were ... ",
  "Endpoints are ... ",
  "NIH",
  50,
  10,
  30,
  "years",
  "01/01/2018",
  "01/01/2018"
)

We can check the study template to see if it passes the basic checks for class, dimension, required columns, data types, and controlled terms by using checkTemplate(). This function takes in the data frame we have created as well as the template name. Any problems will throw an error, so for demonstration purposes we wrap the function in a tryCatch() method here.

# NOTE: messages are printed out
results <- tryCatch(checkTemplate(df = study),
  error = function(e) {
    return(e)
  }
)

# to see error statement
results

It looks like we have an NA value for study$'User Defined ID'. Let's correct this and re-run our checks.

study$`User Defined ID` <- "sdyID"

# NOTE: messages are printed out
results <- tryCatch(checkTemplate(df = study),
  error = function(e) {
    return(e)
  }
)

# to see error statement
results

It seems we are using a non-controlled term in Age Unit column, which requires a controlled value.
Many templates use controlled or preferred terms to help maintain standardized terms across studies, so it is important for us to correct this before writing out the template.

R2i::checkTemplate() will throw an error for columns with non-matching controlled values. However, to see preferred value columns with issues, you must change the default quiet argument to FALSE to receive messages.

In the case of study, we can see which columns have such terms by using the getLookups() function. This function's only argument is the ImmPort Template Name.

getIPLookups(ImmPortTemplateName = "study")

It appears that study has one column with controlled terms: Age Unit. To see what values are allowed for Age Unit we can use the getLookupValues() function. This function takes in the ImmPort Template Name and the column name as arguments, then returns a vector of allowed values.

getIPLookupValues(ImmPortTemplateName = "study", templateColname = "Age Unit")

Since it looks like the original entry for Age Unit ('years') is not in the vector, we must correct it with a capitalized version so it passes the checkTemplate() function later on. We can then see if our data frame passes our checks again.

study$`Age Unit` <- "Years"

# NOTE: messages are printed out
results <- tryCatch(checkTemplate(df = study),
  error = function(e) {
    return(e)
  }
)

# to see error statement
results

PREPARING OTHER DATA FRAMES

Similar to study, the other 9 dataframes that make up the basicStudyDesign.txt can be accessed with the getTemplateDF() function. Depending on the amount of information that needs to be entered, it may be easier to use edit() or base R. For the purpose of the vignette, an R-based approach is used for reproducibiity and demonstration.

The next one is a simple one called study_categorization that defines the type of study performed

study_categorization <- getTemplateDF("study_categorization")
study_categorization[1, ] <- c("Immune Response")
DT::datatable(study_categorization, options = list(scrollX = TRUE))

The next one is a simple one called study_2_condition_or_disease that defines the disease or condition studied.

study_2_condition_or_disease <- getTemplateDF("study_2_condition_or_disease")
study_2_condition_or_disease[1, ] <- c("Typhoid")
DT::datatable(study_2_condition_or_disease, options = list(scrollX = TRUE))

The next one we focus on is called arm_or_cohort. In this demonstration case, we have a csv already of the information needed and just want to bind this new information to correct headers. Therefore we use the arm_or_cohort dataframe only for the colnames() call. An important note: the checkTemplate() function needs the "templateName" attribute in the data frame in order to run the necessary checks. This is easily done by using attr(arm_or_cohort, "templateName") <- "arm_or_cohort".

arm_or_cohort <- getTemplateDF("arm_or_cohort")
file_path <- system.file("extdata/arm_or_cohort_demo.tsv", package = "R2i")
aocImport <- read.table(file_path, sep = "\t", stringsAsFactors = FALSE)
colnames(aocImport) <- colnames(arm_or_cohort)
aocImport <- aocImport[aocImport$`User Defined ID` != "", ]
DT::datatable(aocImport, options = list(scrollX = TRUE))

# to be consistent we rename aocImport for use in the 'write' functions later
arm_or_cohort <- aocImport

# Set "templateName" attribute to pass `checkTemplate()` fn
attr(arm_or_cohort, "templateName") <- "arm_or_cohort"

Making study_personnel:

study_personnel <- getTemplateDF("study_personnel")
study_personnel[1, ] <- c(
  "Personnel1",
  "Dr.",
  "Khanna",
  "Elizabeth",
  "",
  "Major University",
  123,
  "ekhanna@major.edu",
  "PI",
  "Principal Investigator",
  "Major University"
)
DT::datatable(study_personnel, options = list(scrollX = TRUE))

Making planned_visit:

planned_visit <- getTemplateDF("planned_visit")

planned_visit[1, ] <- list(
  1,
  "Screening",
  1,
  -10,
  -2,
  "",
  ""
)

planned_visit[2, ] <- list(
  2,
  "Immunazation",
  2,
  0,
  0,
  "",
  ""
)

planned_visit[3, ] <- list(
  3,
  "Chellenge",
  3,
  100,
  110,
  "",
  ""
)

DT::datatable(planned_visit)

Making inclusion_exclusion:

inclusion_exclusion <- getTemplateDF("inclusion_exclusion")
inclusion_exclusion[1, ] <- c(
  "InclExcl1",
  "older than 35 years old",
  "Exclusion"
)
DT::datatable(inclusion_exclusion)

study_2_protocol is different than other templates. It is a small dataframe with only 1 row and two columns with the first column being a name and the second being he value.

study_2_protocol <- getTemplateDF("study_2_protocol")
study_2_protocol[1, ] <- "protocol 3445"
DT::datatable(study_2_protocol)

Making study_file:

study_file <- getTemplateDF("study_file")
study_file[1, ] <- list(
  "Appendix.txt",
  "Study Appendix",
  "Study Data"
)
DT::datatable(study_file)

Making study_link:

study_link <- getTemplateDF("study_link")
study_link[1, ] <- c(
  "main website",
  "https://drkhannalab.major.edu/NewStudy1"
)
DT::datatable(study_link, options = list(autoWidth = TRUE))

study_pubmed is going to be left blank as a demonstration since some studies may need to be imported prior to being published. Publication information can be entered later using an update template.

study_pubmed <- getTemplateDF("study_pubmed")

Before transforming our data frames and writing the tsv output, we can also do some quality assurance checks of the text in our data frames using the text cleaning functions in the R2i package.

We will use planned_visit$Name as an example to first check for spelling errors using the checkSpelling() function that imports the hunspell package.

checkSpelling(input = planned_visit$Name)

If working interactively, you can use interactiveReplace() to go through each error found with checkSpelling() and input a replacement at the prompt. In this case, we will simply use the findReplace() function to fix Immunazation and Chellenge.

tmp <- findReplace(input = planned_visit$Name, find = "Immunazation", replace = "Immunization")
tmp <- findReplace(input = tmp, find = "Chellenge", replace = "Challenge")

planned_visit$Name <- tmp
DT::datatable(planned_visit)

CREATING THE OUTPUT TSV

To create the tsv that will be included in the ImmPort submission, we use the transform_basicStudyDesign() function that takes a named list of the 9 dataframes as the first argument, as well as outputDir and validate. The outputDir argument is the filepath for the output directory where the tsv should be saved. validate is a boolean with TRUE as the default that uses the validator scripts from ImmPort's web application to ensure that the tsv meets the criteria necessary for import. To see more information about the transform_basicStudyDesign() function you can always enter ?transform_basicStudyDesign in the console.

blocks <- list(
  "study" = study,
  "study_categorization" = study_categorization,
  "study_2_condition_or_disease" = study_2_condition_or_disease,
  "arm_or_cohort" = arm_or_cohort,
  "study_personnel" = study_personnel,
  "planned_visit" = planned_visit,
  "inclusion_exclusion" = inclusion_exclusion,
  "study_2_protocol" = study_2_protocol,
  "study_file" = study_file,
  "study_link" = study_link,
  "study_pubmed" = study_pubmed
)

temp <- tempdir()
transform_basicStudyDesign(
  blocks = blocks,
  outputDir = temp,
  validate = TRUE
)