SampleStratified: Draws a random, stratified sample from a vector of indices.
In Causata: Analysis utilities for binary classification and Causata users.

Description Usage Arguments Details Value Author(s) Examples

Given a vector of logical values, this returns an index where TRUE values are kept and FALSE values are sampled.

1	SampleStratified(idxTrue, scale=1, verbose=TRUE)

`idxTrue`	An array of logical TRUE / FALSE values. All TRUE values are kept (their index is always TRUE), and FALSE values are sampled (their index may be TRUE or FALSE).
`scale`	Controls the sampling rate for FALSE values. See the Details section below for more information.
`verbose`	If TRUE then summary information is printed to the screen.

All TRUE values from the input index are kept. The number of FALSE values that are kept is computed as follows:

sampleRate = √{ \frac{nFalse}{nTrue} } \frac{1}{scale}

numKeep = round( \frac{nFalse}{sampleRate} )

Here nFalse and nTrue are the number of FALSE and TRUE values provided in the array idxTrue. Note that if sampleRate is less than 1 then then no sampling is performed – all FALSE values are kept. Values of scale greater than 1 result in more FALSE values being kept; values below 1 result in fewer.

An array of logical values indicating which records should be kept.

Justin Hemann <support@causata.com>

1 2 3	data(df.causata) idx <- SampleStratified(df.causata$has.responded.mobile.logoff_next.hour_466=="true") table(df.causata$has.responded.mobile.logoff_next.hour_466, idx)