Description Usage Arguments Details Value Author(s) See Also Examples
Extracts a stratified sample of data
1 2 3 | GetStratifiedSample(connect, query, stratification.variable,
stratification.variable.name,
stratification.value=0)
|
connect |
Causata connect object - used to resample at the stratified sampling rates. |
query |
Causata query object - used to resample at the stratified sampling rates. Note that the |
stratification.variable |
A vector of values on which to base the stratification. |
stratification.variable.name |
The name of the Causata variable that is used as the basis of stratification. |
stratification.value |
Value of the stratification.variable which will determine the stratum for a record. |
This function gets a stratified sample of data from Causata. The population will be split into two strata based on whether the stratification.variable
value for a record matches the stratification.value
. Sampling rates for the two strata are then calculated where the rate for the larger strata, strata.A is:
sample.rate.A = sqrt((# records in strata.B) / (# records in strata.A))
New queries are run to resample the Causata data at these sample rates.
Returns a list with two elements as follows:
df |
A dataframe of sampled data containing all of the variables found in |
weights |
A vector of weight values. The weights are the inverse of the probability of selecting a record in the sample. |
Suzanne Weller <support@causata.com>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # create some variables to query for
variables <- c('customer-id', 'total-spend')
# create a stratified sample given an initial query
# The commands below are commented out since they require an actual server connection
#connection <- Connect(hostname="server.causata.com",
# username="user@gmail.com", password="enw8Q!mN")
#query <- Query() + Limit(500)
#df <- GetData(connection, query)
# The commands below are commented out since they require an actual server connection
#sampled.data <- GetStratifiedSample(connection, query,
# df[['has.purchase__Next.30.Days']], 'has.purchase__Next.30.Days', "true")
#table(sampled.data$weights)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.