Getting Started

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = " "
)

library(DT)

options(cli.num_colors = 1)
options(width = 60)
local({
  hook_output <- knitr::knit_hooks$get("output")
  knitr::knit_hooks$set(output = function(x, options) {
    if (!is.null(options$max.height)) {
      options$attr.output <- c(
        options$attr.output,
        sprintf('style="max-height: %s;"', options$max.height)
      )
    }
    hook_output(x, options)
  })
})
knitr::knit_hooks$set(output = function(x, options) {
  if (!is.null(options$max_height)) {
    paste('<pre style = "max-height:',
      options$max_height,
      '; float: left; width: 775px; overflow-y: auto;">',
      x, "</pre>",
      sep = ""
    )
  } else {
    x
  }
})
datatable_template <- function(input_data) {
  datatable(
    input_data,
    rownames = FALSE,
    options = list(
      autoWidth = FALSE,
      scrollX = TRUE,
      pageLength = 5,
      lengthMenu = c(5, 10, 15, 20)
    )
  ) %>%
    formatStyle(
      0,
      target = "row",
      color = "black",
      backgroundColor = "white",
      fontWeight = "500",
      lineHeight = "85%",
      fontSize = ".875em" # same as code
    )
}

Getting Started with xportr

The demo will make use of a small ADSL dataset available with the xportr package and has the following features:

To create a fully compliant v5 xpt ADSL dataset, that was developed using R, we will need to apply the 6 main functions within the xportr package:

# Loading packages
library(dplyr)
library(labelled)
library(xportr)
library(readxl)

# Loading in our example data
data("adsl_xportr", package = "xportr")
datatable_template(adsl_xportr)

Preparing your Specification Files

In order to make use of the functions within {xportr} you will need to create an R data frame that contains your specification file. You will most likely need to do some pre-processing of your spec sheets after loading in the spec files for them to work appropriately with the xportr functions. Please see our example spec sheets in system.file(file.path("specs", "ADaM_spec.xlsx"), package = "xportr") to see how xportr expects the specification sheets.

var_spec <- read_xlsx(
  system.file(file.path("specs/", "ADaM_spec.xlsx"), package = "xportr"),
  sheet = "Variables"
) %>%
  rename(type = "Data Type") %>%
  rename_with(tolower)

Below is a quick snapshot of the specification file pertaining to the ADSL data set, which we will make use of in the 6 {xportr} function calls below. Take note of the order, label, type, length and format columns.

var_spec_view <- var_spec %>%
  filter(dataset == "ADSL")

datatable_template(var_spec_view)

xportr_type()

NOTE: We make use of str() to expose the attributes (length, labels, formats, type) of the datasets. We have suppressed these calls for the sake of brevity.

In order to be compliant with transport v5 specifications an xpt file can only have two data types: character and numeric/dbl. Currently the ADSL data set has chr, dbl, time, factor and date.

str(adsl_xportr)

Using xportr_type() and the supplied specification file, we can coerce the variables in the ADSL set to be either numeric or character.

adsl_type <- xportr_type(adsl_xportr, var_spec, domain = "ADSL", verbose = "message")

Now all appropriate types have been applied to the dataset as seen below.

str(adsl_type)

xportr_length()

Next we can apply the lengths from a variable level specification file to the data frame. xportr_length() will identify variables that are missing from your specification file. The function will also alert you to how many lengths have been applied successfully. Before we apply the lengths lets verify that no lengths have been applied to the original dataframe.

str(adsl_xportr)

No lengths have been applied to the variables as seen in the printout - the lengths would be in the attr() part of each variables. Let's now use xportr_length() to apply our lengths from the specification file.

adsl_length <- adsl_xportr %>% xportr_length(var_spec, domain = "ADSL", verbose = "message")
str(adsl_length)

Note the additional attr(*, "width")= after each variable with the width. These have been directly applied from the specification file that we loaded above!

xportr_order()

Please note that the order of the ADSL variables, see above, does not match the specification file order column. We can quickly remedy this with a call to xportr_order(). Note that the variable SITEID has been moved as well as many others to match the specification file order column. Variables not in the spec are moved to the end of the data and a message is written to the console.

adsl_order <- xportr_order(adsl_xportr, var_spec, domain = "ADSL", verbose = "message")
datatable_template(adsl_order)

xportr_format()

Now we apply formats to the dataset. These will typically be DATE9., DATETIME20 or TIME5, but many others can be used. Notice that in the ADSL dataset there are 8 Date/Time variables and they are missing formats. Here we just take a peak at a few TRT variables, which have a NULL format.

adsl_fmt_pre <- adsl_xportr %>%
  select(TRTSDT, TRTEDT, TRTSDTM, TRTEDTM)

tribble(
  ~Variable, ~Format,
  "TRTSDT", attr(adsl_fmt_pre$TRTSDT, which = "format"),
  "TRTEDT", attr(adsl_fmt_pre$TRTEDT, which = "format"),
  "TRTSDTM", attr(adsl_fmt_pre$TRTSDTM, which = "format"),
  "TRTEDTM", attr(adsl_fmt_pre$TRTEDTM, which = "format")
)

Using our xportr_format() we can apply our formats to the dataset.

adsl_fmt <- adsl_xportr %>% xportr_format(var_spec, domain = "ADSL")
adsl_fmt_post <- adsl_fmt %>%
  select(TRTSDT, TRTEDT, TRTSDTM, TRTEDTM)

tribble(
  ~Variable, ~Format,
  "TRTSDT", attr(adsl_fmt_post$TRTSDT, which = "format"),
  "TRTEDT", attr(adsl_fmt_post$TRTEDT, which = "format"),
  "TRTSDTM", attr(adsl_fmt_post$TRTSDTM, which = "format"),
  "TRTEDTM", attr(adsl_fmt_post$TRTEDTM, which = "format")
)

NOTE: You can use attr(data$variable, which = "format") to inspect formats applied to a dataframe. The above output has these individual calls bound together for easier viewing.

xportr_label()

Please observe that our ADSL dataset is missing many variable labels. Sometimes these labels can be lost while using R's function. However, a CDISC compliant data set needs to have each variable with a label.

adsl_no_lbls <- haven::zap_label(adsl_xportr)

str(adsl_no_lbls)

Using the xport_label function we can take the specifications file and label all the variables available. xportr_label will produce a warning message if you the variable in the data set is not in the specification file.

adsl_lbl <- adsl_xportr %>% xportr_label(var_spec, domain = "ADSL", "message")
str(adsl_lbl)

xportr_write()

Finally, we arrive at exporting the R data frame object as a xpt file with xportr_write(). The xpt file will be written directly to your current working directory. To make it more interesting, we have put together all six functions with the magrittr pipe, %>%. A user can now apply types, length, variable labels, formats, data set label and write out their final xpt file in one pipe! Appropriate warnings and messages will be supplied to a user to the console for any potential issues before sending off to standard clinical data set validator application or data reviewers.

adsl_xportr %>%
  xportr_type(var_spec, "ADSL", "message") %>%
  xportr_length(var_spec, "ADSL", verbose = "message") %>%
  xportr_label(var_spec, "ADSL", "message") %>%
  xportr_order(var_spec, "ADSL", "message") %>%
  xportr_format(var_spec, "ADSL") %>%
  xportr_write("adsl.xpt")

That's it! We now have a xpt file created in R with all appropriate types, lengths, labels, ordering and formats from our specification file. If you are interested in exploring more of the custom warnings and error messages as well as more background on xpt generation be sure to check out the Deep Dive User Guide.

As always, we welcome your feedback. If you spot a bug, would like to see a new feature, or if any documentation is unclear - submit an issue on xportr's GitHub page.



Try the xportr package in your browser

Any scripts or data that you put into this service are public.

xportr documentation built on Oct. 8, 2024, 1:08 a.m.