bllflow supports metadata from DDI and from the R packages hmisc and rjlabelled.
The Data Documentation Intitiatve (DDI) is is an international standard for describing the data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences. There is metadata for variables labels, but also important data provenance information.
Get the description of the study data from DDI and add the description to bllflow object
library(survival) # for pbc data data(pbc) library(bllflow)
pbc_DDI <- read_DDI(file.path(getwd(), "../inst/extdata"), "pbcDDI.xml")
Your metadata is now stored in the pbcDDI
object in two formats:
1) Variable and format labels can be accessed in pbcDDI$variableMetaData
.
All DDI metadata is brought into the pbcDDI
object. For example, variable labels can
be accessed as follows:
str(pbc_DDI$variableMetaData$varlab) # variable labels str(pbc_DDI$variableMetaData$vallab) # value labels for categorial variables
variableMetaData
is generated using the DDIwR
package.
2) All metadata from the DDI document pbcDDI$ddiObject
. Careful
Get the varialbe metadata for variableDetails sheet from DDI and add
that the bllFlow MSW-DDI object. Careful printing: there can be a lot
of information! And there can be a lot of nested lists.
Get the name of the data.
cat("Dataset name: \n") pbc_DDI$ddiObject$codeBook$docDscr$citation$titlStmt$titl
The MSW variables
and variable_details.csv
file contains the variables in your model.
Use the BLLFlow()
function with the MSW files as attributes to add labels to the
BLLFlow object, (variables
and variable_details
and the DDI object (pbc_DDI
).
# read the MSW files variables <- read.csv(file.path(getwd(), '../inst/extdata/PBC-variables.csv')) variable_details <- read.csv(file.path(getwd(), '../inst/extdata/PBC-variableDetails.csv')) # create a BLLFlow object and add labels. pbc_model <- BLLFlow(pbc, variables, variableDetails, pbcDDI)
Metadata is added to variables
and variable_details
. If labels and other metadata
already were in the files, the DDI metadata is added to start_label
, start_type
,
cat_start_value
, cat_start_label
, start_low
, start_high
--- if that data is in
the DDI file.
cat("Variable labels in the original MSW\n") variables$label cat("\nNo variable labels from DDI in variable details\n") variable_details$label cat("\nDDI variable lables added to variableDetailsWithDDI\n") pbc_model$populated_variable_details$label
Use write_DDI_populated_MSW()
to create a MSW variable_details
CSV file.
(change to new name variable_details_with_DDI()
) using variable_details_with_DDI()
.
Then export using write_DDI_populated_MSW()
as a CSV file to help further develop
your study protocol.
write_DDI_populated_MSW()
is a handy function at the beginning of your study.
First, identify thes ariable you need in your study. Then create a 'bane bones'
MSW variables
CSV sheet, import into R and BLLFlow()
to add additional
information from the DDI reference file such as labels, variable types, categories.
Alternatively, 1) directly add metadata to variables in a CSV file (see general utility functions in Example X); or, 2) export a CSV with metadata from an R variable list and a DDI file.
write_DDI_populated_MSW(pbcModel, "../inst/extdata/", "new_MSW_variable_details.csv")
Creates a directory and file (if they don't already exists). new_MSW_variable_details.csv
will be overwritten if the file already exists. new_MSW_variable_details.csv
is
created from variable_details_with_DDI
. First create variable_details_with_DDI
with either BLLFLow()
or get_DDI_variables()
.
Create a new file (new_name.csv
) directly from an existing variable_details.csv.
# Writing does not work with pkgdown howver evaluation works on user machine #WriteDDIPopulatedMSW(pbcDDI, "../inst/extdata/", # "newMSWvariableDetails.csv", # "nonBLLPopulated.csv")
Do we need or want this? Rusty, can you write in a vignette? Replaces the models msw with all or either variables or variableDetails. Passing variableDetails also updates the populatedVariableDetails
pbc_model <- update_MSW(pbc_model, variables, variable_details)
Described previously,read_DDI()
returns a dataframe with all DDI metadata from
a DDI.xml file.
There two additional DDI utility functions that return the two main parts of the DDI file.
get_DDI_description()
returns a dataframe with DDI 'header' information from a DDI file.
The header information includes the dataset ID, name, creatation date and other
important information to preserve the provenance of your study data.
get_DDI_variables()
returns a dataframe with variable DDI metadata for a list of
variables. DDI variable metadata is typical codebook information such as labels,
types, valid and invalid categories, category labels, and descriptive statistics,
such as number valid responses, missing responses, minimum, maximum and mean values.
The bllflow workflow creates an ongoing, updated codebook as you perform your study. The oringial data DDI metadata is the starting point for keeping the updated codebook. The bllflow then creates and modifies an new instance of the codebook as you perform analyses. In addition, a log is created that describes the data cleaning and data transformation steps.
get_DDI_description(DDI_path = ddi_path, DDI_file = ddi_path)
would not add the info to the bllFlow object.
pbc_DDI <- read_DDI(file.path(getwd(), "../inst/extdata"), "pbcDDI.xml") # all the DDI metadata vars_for_pbc <- get_DDI_variables(pbc_DDI, c("age", "sex")) # all the variable metadata (but no study description infomation) descr_for_pbc <- get_DDI_description(pbc_DDI) # all the study description metadata (but no variable infomation)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.