knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
NOTE: The terms "vector" and "variable" are (mostly) used interchangably in this document.
stype
provides an extensible set of data types that in themselves extend certain R
vector classes to be useful in a variety of analytic applications by providing vectors with:
context
object with information about how the variable relates to a study design;data_summary
with relevant summary statistics of the variable;data_summary
when a vector is subset or modified in certain ways;context
.The package relies heavily on the vctrs
package whose goals are:
vec_size()
and vec_type()
as alternatives to length()
and class()
; vignette("type-size")
. These definitions are paired with a framework for type-coercion and size-recycling.vignette("stability")
. This work has been particularly motivated by thinking about the ideal properties of c()
, ifelse()
, and rbind()
.vctr
base class that makes it easy to create new S3 vectors; vignette("s3-vector")
. vctrs provides methods for many base generics in terms of a few new vctrs generics, making implementation considerably simpler and more robust.Each data type provided by stype
(described in more detail below) have constructor functions that begin with v_<type>
. For example, v_binary
creates binary (${0, 1}$) data from R's logical
type.
library(stype) x <- v_binary(c(TRUE, TRUE, TRUE, FALSE)) str(x)
The v_binary
data type prints 0
s and 1
s but the underlying data is logical
:
x vctrs::vec_data(x)
The data type includes some useful utilities such as prettying certain parts of the description
(here the proportion) and a predicate function.
is_binary(x)
Certain math operations work and pull directly from the description
where appropriate (rather than recomputing). Note these operations are still under development and should be used with caution:
mean(x) sum(x) # sum(x, x) # See? very experimental
Other math/arithmetic operations don't work:
# What do you mean you want to add binary and integer? x + 2L # R's base types are not so safe vctrs::vec_data(x) + 2L
Logical operators work as one might expect:
!x all(x) any(x)
Here's where the real magic is.
# vectors can be combined and ... # subsetting maintains and updates attributes c(x, !x[1:3]) # But ... c(x, v_binary(context = context(purpose = purpose(study_role = "other"))))
The following table describes the proposed data types (not all of these may be available at this time). A -- indicates that the type inherits properties from the level above..
| v_<type>
| prototype | support |
|--------------------------|--------------|--------------------|
| v_binary
| logical
| ${0, 1}$ |
| v_count
| integer
| $(0, 1, 2, \dots)$ |
| v_continuous
| double
| $\mathcal{R}$ |
| v_continuous_nonneg
| double
| $\mathcal{R}^+$ |
| v_nominal
| factor
| |
| v_ordered
| ordered
| |
| v_proportion
| double
| $[0, 1]$ |
tibble
library(dplyr) library(tibble) n <- 100 make_context <- function(role){ context(purpose = purpose(study_role = role)) } covariates <- purrr::map( .x = purrr::set_names(1:10, paste0("x", 1:10)), .f = ~ v_binary(as.logical(rbinom(n, 1, 0.25)), context = make_context("covariate")) ) dt <- tibble( y1 = v_binary(as.logical(rbinom(n, 1, 0.25)), context = make_context("outcome")), y2 = v_continuous_nonneg(runif(n, 1, 100), context = make_context("outcome")), y3 = v_continuous(rnorm(n), context = make_context("outcome")), !!! covariates ) dt
Selecting columns based on data type:
dt %>% select_if(is_binary)
Selecting columns based on context:
dt %>% select_if(is_outcome)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.