knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

NOTE: The terms "vector" and "variable" are (mostly) used interchangably in this document.

Design Goals

stype provides an extensible set of data types that in themselves extend certain R vector classes to be useful in a variety of analytic applications by providing vectors with:

Implementation

The package relies heavily on the vctrs package whose goals are:

A quick example

Each data type provided by stype (described in more detail below) have constructor functions that begin with v_<type>. For example, v_binary creates binary (${0, 1}$) data from R's logical type.

library(stype)

x <- v_binary(c(TRUE, TRUE, TRUE, FALSE))

str(x)

The v_binary data type prints 0s and 1s but the underlying data is logical:

x
vctrs::vec_data(x)

The data type includes some useful utilities such as prettying certain parts of the description (here the proportion) and a predicate function.

is_binary(x)

Certain math operations work and pull directly from the description where appropriate (rather than recomputing). Note these operations are still under development and should be used with caution:

mean(x)
sum(x)

# sum(x, x) # See? very experimental

Other math/arithmetic operations don't work:

# What do you mean you want to add binary and integer?
x + 2L 

# R's base types are not so safe
vctrs::vec_data(x) + 2L

Logical operators work as one might expect:

!x
all(x)
any(x)

Here's where the real magic is.

# vectors can be combined and ...
# subsetting maintains and updates attributes
c(x, !x[1:3])

# But ...
c(x, v_binary(context = context(purpose = purpose(study_role = "other"))))

Data types

The following table describes the proposed data types (not all of these may be available at this time). A -- indicates that the type inherits properties from the level above..

| v_<type> | prototype | support | |--------------------------|--------------|--------------------| | v_binary | logical | ${0, 1}$ | | v_count | integer | $(0, 1, 2, \dots)$ | | v_continuous | double | $\mathcal{R}$ | | v_continuous_nonneg | double | $\mathcal{R}^+$ | | v_nominal | factor | | | v_ordered | ordered | | | v_proportion | double | $[0, 1]$ |

Usage in tibble

library(dplyr)
library(tibble)
n <- 100 

make_context <- function(role){
  context(purpose = purpose(study_role = role))
}

covariates <-
purrr::map(
    .x  = purrr::set_names(1:10, paste0("x", 1:10)), 
    .f = ~ v_binary(as.logical(rbinom(n, 1, 0.25)), 
                    context = make_context("covariate"))
)

dt <- tibble(
  y1 = v_binary(as.logical(rbinom(n, 1, 0.25)), context = make_context("outcome")),
  y2 = v_continuous_nonneg(runif(n, 1, 100), context = make_context("outcome")),
  y3 = v_continuous(rnorm(n), context = make_context("outcome")),
  !!! covariates
)

dt

Selecting columns based on data type:

dt %>% select_if(is_binary)

Selecting columns based on context:

dt %>% select_if(is_outcome)


novisci/stype documentation built on July 28, 2022, 7:44 a.m.