split_strata | R Documentation |
Splits pre-defined sampling strata based on values of a continuous or categorical variable.
split_strata(
data,
strata,
split = NULL,
split_var,
type = "global quantile",
split_at = 0.5,
trunc = NULL
)
data |
a dataframe or matrix with one row for each sampling unit, one column specifying each unit's current stratum, one column containing the continuous or categorical values that will define the split, and any other relevant columns. |
strata |
a character string specifying the name of the column that defines each unit's current strata. |
split |
the name of the stratum or strata to be split,
exactly as they appear in |
split_var |
a character string specifying the name of the column that should be used to define the strata splits. |
type |
a character string specifying how the function
should interpret the
|
split_at |
the percentile, value, or name(s) which
|
trunc |
A numeric or character value specifying how the
name of the |
For splits on continuous variables, the new strata are defined
on left-open intervals. The only exception is the first interval,
which must include the overall minimum value. The names of the newly
created strata for a split generated
from a continuous value are the split_var
column name with
the range of values defining that stratum appended to the
old strata name. For a categorical split, the new strata names
are the split_var
column name appended to the
1/0 logical flag specifying whether the unit is in split at
,
all appended to the old strata name.
If the split_var
column name is long,
the user can specify a value for trunc
to prevent the new
strata names from being inconveniently long.
Returns the input dataframe with a new column named 'new_strata' that holds the name of the stratum that each sample belongs to after the split. The column containing the previous strata names is retained and given the name "old_strata".
x <- split_strata(iris, "Sepal.Length",
strata = c("Species"),
split = "setosa", split_var = "Sepal.Width",
split_at = c(0.5), type = "global quantile"
)
# You can split at more than one quantile in one call.
# The above call splits the "setosa" stratum into three of equal size
x <- split_strata(iris, "Sepal.Length",
strata = c("Species"),
split = "setosa", split_var = "Sepal.Width", split_at = c(0.33, 0.66),
type = "local quantile"
)
# Manually select split values with type = "value"
x <- split_strata(iris, "Sepal.Length",
strata = "Species",
split = "setosa", split_var = "Sepal.Width",
split_at = c(3.1, 3.8), type = "value"
)
# Perform a categorical split.
iris$strata <- rep(c(rep(1, times = 25), rep(0, times = 25)), times = 3)
x <- split_strata(iris, "Sepal.Length",
strata = "strata",
split = NULL, split_var = "Species",
split_at = c("virginica", "versicolor"), type = "categorical"
)
# Splits each initial strata 1 and 2 into one stratum with "virginia"
# and "versicolor" species and one stratum with all of the other species
# not specified in the split_at argument.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.