knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(datasetjson)
Some users may be interested in converting SAS Version 5 Transport files into XPT, or converting from other file types such as SAS7BDAT. There may be some existing processes in place to do this in SAS, such as a post-processing conversion step where files are converted in bulk. This vignette offers some guidance and best practice for doing this process in R.
Ideally if you're converting from an XPT file or a SAS7BDAT file, the appropriate metadata would be maintained separate from the dataset itself. Inevitably some information will get lost while being exchanged because of differences of what information was applied or how R interprets SAS datasets. This is one reason why {datasetjson} improves upon XPTs as an exchange format. In leiu of external metadata, here's an example of how you can make a best effort conversion.
adsl <- haven::read_xpt(file.path(system.file(package='datasetjson'), "adsl.xpt")) #' Gather variable metadata in Dataset JSON compliant format #' #' @param n Variable name #' @param .data Dataset to gather attributes #' #' @returns Columns compliant data frame extract_xpt_meta <- function(n, .data) { attrs <- attributes(.data[[n]]) out <- list() # Identify the variable type if (inherits(.data[[n]],"Date")) { out$dataType <- "date" out$targetDataType <- "integer" } else if (inherits(.data[[n]],"POSIXt")) { out$dataType <- "datetime" out$targetDataType <- "integer" } else if (inherits(.data[[n]],"numeric")) { if (any(is.double(.data[[n]]))) out$dataType <- "float" else out$dataType <- "integer" } else if (inherits(.data[[n]],"hms")) { out$dataType <- "time" out$targetDataType <- "integer" } else { out$dataType <- "string" out$length <- max(purrr::map_int(.data[[n]], nchar)) } out$itemOID <- n out$name <- n out$label <- attr(.data[[n]], 'label') out$displayFormat <- attr(.data[[n]], 'format.sas') tibble::as_tibble(out) } # Loop the ADSL columns adsl_meta <- purrr::map_df(names(adsl), extract_xpt_meta, .data=adsl) adsl_meta
Now that we have the metadata, we can use this to write out the Dataset JSON file.
# Create the datasetjson object ds_json <- dataset_json( adsl, item_oid = "ADSL", name = "ADSL", dataset_label = attr(adsl, 'label'), columns = adsl_meta ) # Write the JSON json_file_content <- write_dataset_json(ds_json)
Just for good measure, we can confirm the metadata we just created is compliant with the schema.
# Check schema compliance validate_dataset_json(json_file_content)
If your intention is to convert files into Dataset JSON format in bulk, there are a couple things you should consider - especially if you're trying to replicate existing procedures done using SAS:
Remember that R by default holds data in memory, whereas work datasets in SAS are still written to disk. This means that if you're doing a bulk conversion of datasets, you'll want to read in and write out one dataset at a time. For example:
It's likely best to wrap the conversion process in a function, because the function namespace will inherently release the temporary objects during garbage collection. Depending on the size of your data, it could be easy to max out memory if you first read in all the datasets.
A second consideration is use of the function validate_dataset_json()
. Particularly if you have large datasets, we recommend against using this function. The validation it performs is against the Dataset JSON schema - so it's not performing additional CDISC checks on the data. We offer this function primarily due to the fact that the schema is available, and if necessary the function allows you to check that the schema compliance is met. That said, we've done the testing to make sure that write_dataset_json()
writes the file out using a compliant schema - so this step is redundant.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.