knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
NOTE: The terms "vector" and "variable" are (mostly) used interchangably in this document.
stype provides an extensible set of data types that in themselves extend certain R vector classes to be useful in a variety of analytic applications by providing vectors with:
context object with information about how the variable relates to a study design;data_summary with relevant summary statistics of the variable;data_summary when a vector is subset or modified in certain ways;context.The package relies heavily on the vctrs package whose goals are:
vec_size() and vec_type() as alternatives to length() and class(); vignette("type-size"). These definitions are paired with a framework for type-coercion and size-recycling.vignette("stability"). This work has been particularly motivated by thinking about the ideal properties of c(), ifelse(), and rbind().vctr base class that makes it easy to create new S3 vectors; vignette("s3-vector"). vctrs provides methods for many base generics in terms of a few new vctrs generics, making implementation considerably simpler and more robust.Each data type provided by stype (described in more detail below) have constructor functions that begin with v_<type>. For example, v_binary creates binary (${0, 1}$) data from R's logical type.
library(stype) x <- v_binary(c(TRUE, TRUE, TRUE, FALSE)) str(x)
The v_binary data type prints 0s and 1s but the underlying data is logical:
x vctrs::vec_data(x)
The data type includes some useful utilities such as prettying certain parts of the description (here the proportion) and a predicate function.
is_binary(x)
Certain math operations work and pull directly from the description where appropriate (rather than recomputing). Note these operations are still under development and should be used with caution:
mean(x) sum(x) # sum(x, x) # See? very experimental
Other math/arithmetic operations don't work:
# What do you mean you want to add binary and integer? x + 2L # R's base types are not so safe vctrs::vec_data(x) + 2L
Logical operators work as one might expect:
!x all(x) any(x)
Here's where the real magic is.
# vectors can be combined and ... # subsetting maintains and updates attributes c(x, !x[1:3]) # But ... c(x, v_binary(context = context(purpose = purpose(study_role = "other"))))
The following table describes the proposed data types (not all of these may be available at this time). A -- indicates that the type inherits properties from the level above..
| v_<type> | prototype | support |
|--------------------------|--------------|--------------------|
| v_binary | logical | ${0, 1}$ |
| v_count | integer | $(0, 1, 2, \dots)$ |
| v_continuous | double | $\mathcal{R}$ |
| v_continuous_nonneg | double | $\mathcal{R}^+$ |
| v_nominal | factor | |
| v_ordered | ordered | |
| v_proportion | double | $[0, 1]$ |
tibblelibrary(dplyr) library(tibble) n <- 100 make_context <- function(role){ context(purpose = purpose(study_role = role)) } covariates <- purrr::map( .x = purrr::set_names(1:10, paste0("x", 1:10)), .f = ~ v_binary(as.logical(rbinom(n, 1, 0.25)), context = make_context("covariate")) ) dt <- tibble( y1 = v_binary(as.logical(rbinom(n, 1, 0.25)), context = make_context("outcome")), y2 = v_continuous_nonneg(runif(n, 1, 100), context = make_context("outcome")), y3 = v_continuous(rnorm(n), context = make_context("outcome")), !!! covariates ) dt
Selecting columns based on data type:
dt %>% select_if(is_binary)
Selecting columns based on context:
dt %>% select_if(is_outcome)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.