Description Usage Arguments Details Value Preprocessing Import and Read Functions References Examples
Functions to import and preprocess raw (or simulated) data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | preprocess(path, outpath = NULL, report = NULL)
preprocess_simulated(path, outpath = NULL, report = NULL)
import_sessions(path)
import_sessions_simulated(path)
import_phase_info(path)
import_phase_info_simulated(path)
import_cjudgments_simulated(path)
import_cjudgments(path)
import_tratings(path)
import_tratings_simulated(path)
read_sessions(path)
read_sessions_simulated(path)
read_cjudgments(path)
read_cjudgments_simulated(path)
read_tratings(path)
read_tratings_simulated(path)
|
path |
Path to the directory containing raw data files. |
outpath |
Path to the directory where anonymized data will be saved. |
report |
Filename of the HTML preprocessing report. |
The purpose of these functions are to import, transform,
and anonymize raw data files from the Truth Trajectory study by
\insertCiteHenderson_Simons_Barr_2021;textualtruthiness. As
few users other than the researchers will have access to the
original non-anonymized data, functions are also supplied to
perform the same set of actions on simulated data. There are two
versions of each function, an original version (e.g.,
preprocess
) and a simulated version (e.g.,
preprocess_simulated
). We include two sets of functions
because the simulated functions were built during the planning
stage of the study, based on assumptions about the structure of
the raw data files that turned out to be incorrect once we
obtained pilot data. Rather than laboriously re-write the
simulation functions to match the new data structure, we decided
to preserve the old functions and split them off from the new
versions. They perform the same set of actions and yield the same
end products, but import and transform the data differently
because of the differing nature of the raw data files.
The "preprocessing" functions are the high-level functions and the only ones that most users will need. The "import" and "read" are lower-level functions that are called by the "preprocess" functions, and are described here for completeness.
A string with the path to the generated HTML report.
Generally, users will not have access to the non-anonymized raw
data and so will not need to use any of these functions, except
when working with simulated data. The data objects resulting from
the preprocessing of the original raw data are available as
built-in data objects documented in
truth_trajectory_data
. Users interested in
reproducing the results from the anonymized data should start with
the documentation for reproduce_analysis
.
The preprocess
functions load in the data from the raw data
files and write out (1) non-anonymized, preprocessed data files;
(2) anonymized, preprocessed data files; and (3) an HTML report. It
performs these actions by running scripts derived from R Markdown
templates included in the package. It is not necessary to view
these scripts, but if you wish to do so, use
draft
; R Studio users can also access the
templates from the "New File > R Markdown" pull down menu and then
selecting the appropriate template in the dialog box.
To access this preprocessing script for simulated data:
rmarkdown::draft("preprocessing-simulated.Rmd",
"illusory-truth-preprocessing-sim", "truthiness")
and the preprocessing script for real data:
rmarkdown::draft("preprocessing.Rmd",
"illusory-truth-preprocessing", "truthiness")
The processing script outputs four anonymized data files into the
subdirectory named in the outpath
argument. For maximum
portability, each file is stored in two versions: binary (RDS)
format as well as comma-separated values (CSV). These files are
called ANON_sessions
, ANON_phases
,
ANON_categories
, and ANON_ratings
and the data they
contain is described in the codebook
.
In addition to the anonymized data, the preprocessing scripts
output two files with non-anonymized data. These files contain
sensitive information (Prolific IDs and answers to open-ended
questions) and are named NOT_ANONYMIZED_sessions.rds
and
NOT_ANONYMIZED_phases.rds
. They are written to the
"target directory", which is the directory just above the
subdirectory with the anonymized data as specified by
outpath
; if outpath
is NULL
, then a
subdirectory is created in the working directory for the anonymized
files and the target directory will be the working directory. The
compiled HTML report is also stored in the target directory. If the
filename is not specified by the user (NULL
), then one is
generated, with a prefix corresponding to the name of the
subdirectory where the anonymized data is stored, and the suffix
"-preprocessing.html". The return value of the preprocessing
function is the file path to this report.
Users can manually add exclusions by editing the files
manually_exclude_participants.csv
and
manually_exclude_phases.csv
in the target directory; if they
don't exist, then they will be written to the target directory when
the script is first run. Thus, it is wise to run the preprocessing
script twice: once to create the files so that the user can see how
the entries in these files should be structured, and once again
after filling in the data to apply the manual exclusions.
The import_*
and read_*
functions are not
intended to be called directly; instead, the user will typically
call the preprocess
or preprocess_simulated
function, or render the R Markdown preprocessing template (using
draft
). These lower-level functions are invoked by
these higher-level functions, and are documented here for
completeness.
The import_*
functions extract session, phase, category
judgments, or ratings data from the full set of raw data files in
subdirectory path
and return a (non-anonymized) data frame
with the corresponding data. They do this by calling the
corresponding read_*
function for each of the single input
files in the subdirectory, and transforming and combining the
information as required.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | td_raw <- tempfile() # temp dir for raw data
td_anon <- tempfile() # temp dir for preprocessed data
## simulate data and preprocess it
set.seed(62)
simulate_resp_files(40, path = td_raw, overwrite = TRUE)
## run the built-in R Markdown script
tf1 <- tempfile(fileext = ".html") # temporary file for report
report <- preprocess_simulated(td_raw, td_anon, tf1)
browseURL(report) # view the HTML preprocessing report
file.remove(report) # clean up
sess <- import_sessions_simulated(td_raw)
sess_p1 <- read_sessions_simulated(file.path(td_raw, "P1L1.csv"))
# clean up temp files
unlink(td_raw, TRUE, TRUE)
unlink(td_anon, TRUE, TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.