knitr::opts_chunk$set(eval = FALSE)
RDataXMan
requires a specific folder structure for the functions to work:
Working directory |-- public_data |-- public_data_template `-- research |-- research folder1 (e.g., requestnum001) |-- private_data |-- private_data_template |-- request_input `-- request_output |-- research folder2 `-- ...
RDataXMan
packageThe workflow using RDataXMan
is illustrated using an arbitrary example.
This step creates the folder structure required by the package, as illustrated
in the diagram above. Users may create the required subfolders manually, yet are
highly recommended to use functions initWkdir
and initResearchFolder
for
convenience.
User may select an existing folder as the working directory, or specify the directory to a new folder, in which case a new working directory will be created.
library(RDataXMan) initWkdir(wkdir = "Working directory") initResearchFolder(wkdir = "Working directory", research.folder = "requestnum001")
After setting working directory and research folder, all public datasets that
are saved as flat files should be stored in subfolder public_data
of working
directory, and all flat tables that are only meant to be accessible by this
particular research project should be stored in subfolder private_data
of
selected research folder.
genInclusion(wkdir = "Working directory", research.folder = "requestnum001", table_name, data.type, database, key.var, key.desc, identifier.var, count, overwrite, username, password) genVariable(wkdir = "Working directory", research.folder = "requestnum001", table_name, data.type, database, identifier.var, omit.var, overwrite, username, password)
Parameter data.type
should take value public
or private
if using flat
tables, to indicate whether database is saved in public_data
or private_data
in the specified research folder. If linking to an actual database, data.type
should be the type of database to link to, i.e., SQL
.
The Excel files generated by genInclusion
and genVariable
will be saved into
private_data_template
folder if data.type
is private
, and
public_data_template
otherwise.
After generating the request forms, users need to indicate their selection in
each of the forms and save the modified copies in request_input
folder within
the research folder. See sections below for how to make selection in request
forms. Note that the package requires users to leave the original copy of
request forms in the template folders.
After filling in all request forms and saving them in request_input
folder,
data extraction can be done using function extract_data
:
extract_data(wkdir = "Working directory", research.folder = "requestnum001", inclusion.xls.file, dataLogic, variable.xls.file, select.output, overwrite, username, password)
inclusion.xls.file
and variable.xls.file
can be vectors if multiple
inclusion criteria and/or variable lists are to be used for extraction. Users
must specify dataLogic
to be either intersection
or union
for multiple
inclusion criteria.
select.output
indicates output files to be generated by the function, where
multiple selections can be indicated by a vector:
1
: generate lists of identifier variables from merged inclusion criteria.2
: generate an Excel file with summary statistics for both inclusion
criteria and variable lists. 3
: generate csv
files with data extracted based on each request form.4
: generate a single csv
file for the final merged data. Selected output files will be written into the request_output
folder within
the research folder, and meanwhile returned as a list of data.frame
s in the R
environment except for 2
.
In the final merged data, if there are multiple columns with the same name, each
column name will be appended with the index of its corresponding variable list
in the variable.xls.file
argument.
request_input
folder, and their file type should be "Excel 97-2003 Workbook" (i.e. with xls
extension). Users are hence highly recommended to modify request forms and then
use Save As...
option of Excel to save the modified versions in
request_input
folder for convenience.In this section we use an example to illustrate how to specify inclusion criteria by filling up the sheet "overall list" in long or short format.
Suppose we want to select all patients in a dataset as the starting cohort and
eventually reduce the sample with other inclusion criteria. In this case,
patient Id (denoted as patient NRIC in the example data) should be used as both
key variable and identifier variable. There are two ways to specify such
selection, one via the selection
column of the request form, the other via the
logic
column.
selection
columnThe "overall list" sheet contains all ICD records in the current database.
The remarks
column (i.e. Column C) is for notes and comments. To select all
patients, we put an x
for each row in column selection
of sheet overall
list
:
logic
columnInstead of going through each and every row to select all patients, users may
also use R logical expression as a shortcut by writing !is.na(PATIENT_NRIC)
in the first row of logic
column, meaning that any row with a valid patient
NRIC, effectively every row, should be selected:
Users mush make sure that the logical statement specified is following valid R
syntax, and need to pay attention to data type of key variable. For instance,
as.numeric(DIAGNOSIS_CD) >= 174 & as.numeric(DIAGNOSIS_CD) < 175
is advised
instead of DIAGNOSIS_CD >= 174 & DIAGNOSIS_CD < 175
if the inclusion criteria
is to select any ICD code from 174.0 to 174.9, and users are not sure whether
the column for ICD code is numeric. Note that only the first row of this column
will be interpreted.
In this section we use an example to illustrate how to select variables for the final output by filling up the sheet "overall list" of variable list in long or short format.
To select specific variables listed in variable
(Column B), put an x
in the
corresponding row in Column D, and leave Column D blank if a variable is not
wanted.
Suppose we are interested in race, birth year and death indicator of patients, the sheet "overall list" of the variable list should be filled in the following way:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.