subset_sample: Subset to analytic sample

Description Usage Arguments Value Examples

View source: R/subset_sample.R

Description

subset_sample() is a function that subsets data to create a smaller analytic dataset. It prints how many observations remain after each sequential subset, as well as how many observations meet the keep criteria overall.

Usage

1
subset_sample(DT, subset_vars)

Arguments

DT

A data.table.

subset_vars

A vector of string column names in DT. Each column should be a dummy variable, with 1 (or TRUE) set as the keep condition and 0 (or FALSE) as the drop condition.

Value

Subsetted DT

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 2013 nyc flights data
DT <- as.data.table(nycflights13::flights)
# define keep criteria (1 for keep, 0 for drop)
# afternoon flights
DT[, `:=`(keep_sched_dep_time = ifelse(sched_dep_time >= 1200, 1, 0),
          # departing from Newark
          keep_origin = ifelse(origin == "EWR", 1, 0))]
# assign subsetted data and print observations at each step
DT_sub <- subset_sample(DT, subset_vars = c("keep_sched_dep_time",
                                            "keep_origin"))

appmicro/appmicro documentation built on Nov. 2, 2019, 1:58 p.m.