preprocess: Importing and Preprocessing Longitudinal Illusory Truth Data
In truthiness: Illusory Truth Longitudinal Study

Description Usage Arguments Details Value Preprocessing Import and Read Functions References Examples

Functions to import and preprocess raw (or simulated) data.

preprocess(path, outpath = NULL, report = NULL)

preprocess_simulated(path, outpath = NULL, report = NULL)

import_sessions(path)

import_sessions_simulated(path)

import_phase_info(path)

import_phase_info_simulated(path)

import_cjudgments_simulated(path)

import_cjudgments(path)

import_tratings(path)

import_tratings_simulated(path)

read_sessions(path)

read_sessions_simulated(path)

read_cjudgments(path)

read_cjudgments_simulated(path)

read_tratings(path)

read_tratings_simulated(path)

`path`	Path to the directory containing raw data files.
`outpath`	Path to the directory where anonymized data will be saved.
`report`	Filename of the HTML preprocessing report.

The purpose of these functions are to import, transform, and anonymize raw data files from the Truth Trajectory study by \insertCiteHenderson_Simons_Barr_2021;textualtruthiness. As few users other than the researchers will have access to the original non-anonymized data, functions are also supplied to perform the same set of actions on simulated data. There are two versions of each function, an original version (e.g., preprocess) and a simulated version (e.g., preprocess_simulated). We include two sets of functions because the simulated functions were built during the planning stage of the study, based on assumptions about the structure of the raw data files that turned out to be incorrect once we obtained pilot data. Rather than laboriously re-write the simulation functions to match the new data structure, we decided to preserve the old functions and split them off from the new versions. They perform the same set of actions and yield the same end products, but import and transform the data differently because of the differing nature of the raw data files.

The "preprocessing" functions are the high-level functions and the only ones that most users will need. The "import" and "read" are lower-level functions that are called by the "preprocess" functions, and are described here for completeness.

A string with the path to the generated HTML report.

Generally, users will not have access to the non-anonymized raw data and so will not need to use any of these functions, except when working with simulated data. The data objects resulting from the preprocessing of the original raw data are available as built-in data objects documented in truth_trajectory_data. Users interested in reproducing the results from the anonymized data should start with the documentation for reproduce_analysis.

The preprocess functions load in the data from the raw data files and write out (1) non-anonymized, preprocessed data files; (2) anonymized, preprocessed data files; and (3) an HTML report. It performs these actions by running scripts derived from R Markdown templates included in the package. It is not necessary to view these scripts, but if you wish to do so, use draft; R Studio users can also access the templates from the "New File > R Markdown" pull down menu and then selecting the appropriate template in the dialog box.

To access this preprocessing script for simulated data:

rmarkdown::draft("preprocessing-simulated.Rmd", "illusory-truth-preprocessing-sim", "truthiness")

and the preprocessing script for real data:

rmarkdown::draft("preprocessing.Rmd", "illusory-truth-preprocessing", "truthiness")

The processing script outputs four anonymized data files into the subdirectory named in the outpath argument. For maximum portability, each file is stored in two versions: binary (RDS) format as well as comma-separated values (CSV). These files are called ANON_sessions, ANON_phases, ANON_categories, and ANON_ratings and the data they contain is described in the codebook.

In addition to the anonymized data, the preprocessing scripts output two files with non-anonymized data. These files contain sensitive information (Prolific IDs and answers to open-ended questions) and are named NOT_ANONYMIZED_sessions.rds and NOT_ANONYMIZED_phases.rds. They are written to the "target directory", which is the directory just above the subdirectory with the anonymized data as specified by outpath; if outpath is NULL, then a subdirectory is created in the working directory for the anonymized files and the target directory will be the working directory. The compiled HTML report is also stored in the target directory. If the filename is not specified by the user (NULL), then one is generated, with a prefix corresponding to the name of the subdirectory where the anonymized data is stored, and the suffix "-preprocessing.html". The return value of the preprocessing function is the file path to this report.

Users can manually add exclusions by editing the files manually_exclude_participants.csv and manually_exclude_phases.csv in the target directory; if they don't exist, then they will be written to the target directory when the script is first run. Thus, it is wise to run the preprocessing script twice: once to create the files so that the user can see how the entries in these files should be structured, and once again after filling in the data to apply the manual exclusions.

The import_* and read_* functions are not intended to be called directly; instead, the user will typically call the preprocess or preprocess_simulated function, or render the R Markdown preprocessing template (using draft). These lower-level functions are invoked by these higher-level functions, and are documented here for completeness.

The import_* functions extract session, phase, category judgments, or ratings data from the full set of raw data files in subdirectory path and return a (non-anonymized) data frame with the corresponding data. They do this by calling the corresponding read_* function for each of the single input files in the subdirectory, and transforming and combining the information as required.

\insertAllCited

td_raw <- tempfile()  # temp dir for raw data
td_anon <- tempfile() # temp dir for preprocessed data

## simulate data and preprocess it

set.seed(62)
simulate_resp_files(40, path = td_raw, overwrite = TRUE)


## run the built-in R Markdown script
tf1 <- tempfile(fileext = ".html") # temporary file for report
report <- preprocess_simulated(td_raw, td_anon, tf1)

browseURL(report) # view the HTML preprocessing report

file.remove(report) # clean up


sess <- import_sessions_simulated(td_raw)
sess_p1 <- read_sessions_simulated(file.path(td_raw, "P1L1.csv"))

# clean up temp files
unlink(td_raw, TRUE, TRUE)
unlink(td_anon, TRUE, TRUE)