SampleStratified: Draws a random, stratified sample from a vector of indices.

Description Usage Arguments Details Value Author(s) Examples

Description

Given a vector of logical values, this returns an index where TRUE values are kept and FALSE values are sampled.

Usage

1
SampleStratified(idxTrue, scale=1, verbose=TRUE)

Arguments

idxTrue

An array of logical TRUE / FALSE values. All TRUE values are kept (their index is always TRUE), and FALSE values are sampled (their index may be TRUE or FALSE).

scale

Controls the sampling rate for FALSE values. See the Details section below for more information.

verbose

If TRUE then summary information is printed to the screen.

Details

All TRUE values from the input index are kept. The number of FALSE values that are kept is computed as follows:

sampleRate = √{ \frac{nFalse}{nTrue} } \frac{1}{scale}

numKeep = round( \frac{nFalse}{sampleRate} )

Here nFalse and nTrue are the number of FALSE and TRUE values provided in the array idxTrue. Note that if sampleRate is less than 1 then then no sampling is performed – all FALSE values are kept. Values of scale greater than 1 result in more FALSE values being kept; values below 1 result in fewer.

Value

An array of logical values indicating which records should be kept.

Author(s)

Justin Hemann <support@causata.com>

Examples

1
2
3
data(df.causata)
idx <- SampleStratified(df.causata$has.responded.mobile.logoff_next.hour_466=="true")
table(df.causata$has.responded.mobile.logoff_next.hour_466, idx)

Causata documentation built on May 2, 2019, 3:26 a.m.