sim_postcode_samples | R Documentation |
Simulate a high-cardinality feature and a binary response
sim_postcode_samples(
df_levels,
n = 2000L,
threshold = 1000,
prob = c(0.3, 0.1),
seed = 1001
)
df_levels |
Number of levels. |
n |
Number of samples. |
threshold |
The threshold for determining if a postal code is rare. |
prob |
Occurrence probability vector of the class 1 event in rare and non-rare postal codes. |
seed |
Random seed. |
A data frame of samples with postal codes, response labels, and level rarity status.
The code is derived from the example described in the "rare levels"
vignette in the vtreat
package.
df_levels <- sim_postcode_levels(nlevels = 500, seed = 42)
df_postcode <- sim_postcode_samples(
df_levels,
n = 10000, threshold = 3000, prob = c(0.2, 0.1), seed = 43
)
head(df_postcode)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.