knitr::opts_chunk$set(echo = TRUE)

This document answers the following questions through the example of question notes: "How do I find specific information or properties about questions from a QSF file?" "How do I find which questions have specific information or properties in their structure?

This document demonstrates the use of for loops, the *apply family of functions, and some data type conversion and constructions with lists, vectors, and dataframes.

Finally, this document answers one more question: How do I view dataframes with large amounts of text easily in R?

# Load the local version of the package
devtools::load_all("Q:/Student Work/Emma's Student Work/Report Generation/QualtricsTools")
# Load an example survey with notes.
get_setup(headerrows=3, 
          qsf_path = file.path(
            path.package("QualtricsTools"),
            "/data/Sample Surveys/User Notes Survey/Notes_Survey.qsf"
            ),
          csv_path = file.path(
            path.package("QualtricsTools"),
            "/data/Sample Surveys/User Notes Survey/Notes_Survey.csv"
            )
)

First, how do we find questions which have user notes?

When we use get_setup, this calls get_coded_questions_and_blocks, which inserts notes into questions as list elements of question[['qtNotes']] with "User Note: " prepended.

This just prints the questions' data export tags to the screen for each question that has user notes added.

questions_with_notes <- list()  # Make an empty list for storing DataExportTags
for (question in questions) {
  if ('qtNotes' %in% names(question)) { # Check that question[['qtNotes']] exists
    # user_notes_tf is a list of TRUE/FALSE values for each qtNote that indicates
    # whether or not "User Note: " is a substring of the qtNote.
    user_notes_tf <- grepl("User Note: ", question[['qtNotes']])
    # If there are any qtNotes which are user notes as indicated by user_notes_tf
    # for the current question, append the current question's dataexporttag to the
    # list of questions_with_notes.
    if (any(user_notes_tf)) {
      questions_with_notes <- c(questions_with_notes,
                                question[['Payload']][['DataExportTag']])
    }
  }
}
print(questions_with_notes)

There are two ways that one could make the above code into a function: By simply wrapping this with something like

find_user_notes_questions <- function(questions) {
...
}

Or, we could turn the inside logic which determines whether or not a question has user notes into a function that takes an argument of a question which we can apply to the entire list of questions. This is a more modular approach, and is preferable because then we can use the "has_user_notes(question)" function in more ways. Here I will demonstrate the second of these ways.

has_user_notes <- function(question) {
  if ('qtNotes' %in% names(question)) { # Check that question[['qtNotes']] exists
    # user_notes_tf is a list of TRUE/FALSE values for each qtNote that indicates
    # whether or not "User Note: " is a substring of the qtNote.
    user_notes_tf <- grepl("User Note: ", question[['qtNotes']])
    # If there are any qtNotes which are user notes as indicated by user_notes_tf
    # for the current question, append the current question's dataexporttag to the
    # list of questions_with_notes.
    if (any(user_notes_tf)) {
      return(TRUE)
    }
  }
  # If the previous return wasn't executed, then the question either has no qtNotes
  # or no notes with "User Notes: " as a substring.
  return(FALSE)
}

Now we use sapply to run the has_user_notes function on every question and return a logical vector with TRUE/FALSE values for each question. We're going to use sapply as opposed to lapply because lapply always returns a list structure (which is not necessarily of homogenous datatype, i.e. list(TRUE, "a") is a valid list in R) whereas sapply simplifies the output to a homogenous datatype such as vector or matrix where possible. (vectors and matrices are "atomic" data structures in R, meaning that they are homogenous in the sense that all their entries must be of the same datatype so that the vector is strictly one of "logical", "integer", "numeric", "double", "complex", "character" or "raw".

sapply(questions, has_user_notes)

Now that we have a logical vector, we can find out which indices of questions have user notes.

questions_with_notes_indices <- which(sapply(questions, has_user_notes))

To just get the dataexporttags of these questions, something like this is effective:

dataexporttags_with_notes <- sapply(questions_with_notes_indices, function(i) {
  questions[[i]][['Payload']][['DataExportTag']]
  })

If we want a list of these questions, we can use list subsetting:

questions_with_notes <- questions[questions_with_notes_indices]

Using str to see the structure of questions_with_notes, we can see that it is in fact a list of three questions that can be verified to have data export tags.

str(questions_with_notes, max.level = 1)

This function allows us to input a specific question and get back a list of the user notes. This uses gsub to remove the prepended "User Note: " portion of the user notes.

list_user_notes <- function(question) {
  if ('qtNotes' %in% names(question)) { # Check that question[['qtNotes']] exists
    # user_notes_tf is a list of TRUE/FALSE values for each qtNote that indicates
    # whether or not "User Note: " is a substring of the qtNote.
    user_notes_tf <- grepl("User Note: ", question[['qtNotes']])
    # If there are any qtNotes which are user notes as indicated by user_notes_tf
    # for the current question, append the current question's dataexporttag to the
    # list of questions_with_notes.
    if (any(user_notes_tf)) {
      user_notes <- question[['qtNotes']][which(user_notes_tf)]
      cleaned_user_notes <- sapply(user_notes, function(x) gsub("User Note: ", "", x))
      return(cleaned_user_notes)
    }
  }
}
list_user_notes(questions[[1]])

We can pair the output information from the list_user_notes with the data export tag in order to be able to create a character matrix which shows for each data export tags what user notes are associated to it. The data export tags will appear in the first row, and the notes will appear in the second.

table_question_notes <- function(question) {
  sapply(list_user_notes(question), function(x) {
    c(question[['Payload']][['DataExportTag']], x)
    })
}
table_question_notes(questions[[1]])

table_all_notes <- function(questions) sapply(questions, table_question_notes)
all_notes <- table_all_notes(questions)
all_notes

Notice a few things about table_all_notes: It outputs a list, which is not yet a dataframe as we would like the end result to be. If a question does not have any notes, then for the ith question it will include an empty list() in its output (where i represents an actual integer). We can replicate this scenario by just removing the qtNotes from questions[[2]].

questions[[2]][['qtNotes']] <- NULL
all_notes <- table_all_notes(questions)
all_notes

To remove the empty list() elements in , we need a delete.NULLs function which removes any empty sublists. The unlist function returns a flat atomic vector (coercing entries into the same datatype if need-be) which we can use to find list elements which have no contents, i.e. length 0. We then subset including only elements which when unlisted have length not 0 and return the sublist.

delete.NULLs  <-  function(x){
  x[unlist(lapply(x, length) != 0)]
}

We can use all_notes to delete the empty list in [[2]].

all_notes <- delete.NULLs(all_notes)
all_notes

Now we can actually turn this list into a useable dataframe by simply using as.data.frame:

notes_df <- as.data.frame(all_notes)
notes_df

Let's get rid of those unnecessary and unhelpful column names.

colnames(notes_df) <- NULL
notes_df

If we use a more complicated survey, we can quickly see that we ought to transpose this dataframe for readability.

get_setup(headerrows=3, 
          qsf_path=file.path("Q:/Student Work/Emma's Student Work/",
                             "Debugging QualtricsTools (Christian)/",
                             "First Year survey 2017/QT Notes issue 5-26-17/",
                             "2017_First_Year_Experience_Survey.qsf"),
          csv_path=file.path("Q:/Student Work/Emma's Student Work/",
                             "Debugging QualtricsTools (Christian)/",
                             "First Year survey 2017/QT Notes issue 5-26-17/",
                             "2017 FY Survey RAW.csv")
)
all_notes <- table_all_notes(questions)
all_notes <- delete.NULLs(all_notes)
notes_df <- as.data.frame(all_notes)
colnames(notes_df) <- NULL
notes_df <- t(notes_df)

Now answer the final question in this document: How to view such a dataframe easily in R? If we print it on the console, we might not be able to see adjacent columns appear next to each other.

In order to be able to render the dataframe such that we can see the full contents of each entry, I am using the knitr package's kable function to render an HTML version of the dataframe. The RStudio viewer also does not allow you to view the contents very well either; Viewing a dataframe with View(notes_df) often truncates the data with no way to get around it. I will only table the head of the dataframe, as to not clutter this document.

knitr::kable(head(notes_df), format="html")

If you're actually using the R console, you can use knitr::kable without being in an Rmarkdown document like I am by saving the kable HTML as a temporary file and rendering it with the Rstudio viewer.

knitr_view <- function(df) {
  tempDir <- tempfile()
  dir.create(tempDir)
  htmlFile <- file.path(tempDir, "notes_dataframe.html")
  fileConnection <- file(htmlFile)
  writeLines(knitr::kable(df, format="html"), fileConnection)
  close(fileConnection)
  rstudioapi::viewer(htmlFile)
}

Rstudio Viewer

Knitr Generated HTML

knitr_view(notes_df)

Conclusion

I hope this document was helpful in explaining how to find and select questions based on particular criterion, how to find specific information inside questions, and how to view the data you find more easily. Further directions I would suggest include learning about the apply family in the base-R package and about iteration in Hadley Wickham's R for Data Science book.



ctesta01/QualtricsTools documentation built on May 14, 2019, 12:27 p.m.