Description Usage Arguments Details Value Validation See Also Examples
validate - asserts the following:
The column names of data must contain all original_names.
check - returns the following:
ok A logical. Does the check pass?
missing_names A character vector. The missing column names.
1 2 3 | validate_column_names(data, original_names)
check_column_names(data, original_names)
|
data |
A data frame to check. |
original_names |
A character vector. The original column names. |
A special error is thrown if the missing column is named ".outcome". This
only happens in the case where mold() is called using the xy-method, and
a vector y value is supplied rather than a data frame or matrix. In that
case, y is coerced to a data frame, and the automatic name ".outcome" is
added, and this is what is looked for in forge(). If this happens, and the
user tries to request outcomes using forge(..., outcomes = TRUE) but
the supplied new_data does not contain the required ".outcome" column,
a special error is thrown telling them what to do. See the examples!
validate_column_names() returns data invisibly.
check_column_names() returns a named list of two components,
ok, and missing_names.
hardhat provides validation functions at two levels.
check_*(): check a condition, and return a list. The list
always contains at least one element, ok, a logical that specifies if the
check passed. Each check also has check specific elements in the returned
list that can be used to construct meaningful error messages.
validate_*(): check a condition, and error if it does not pass. These
functions call their corresponding check function, and
then provide a default error message. If you, as a developer, want a
different error message, then call the check_*() function yourself,
and provide your own validation function.
Other validation functions:
validate_no_formula_duplication(),
validate_outcomes_are_binary(),
validate_outcomes_are_factors(),
validate_outcomes_are_numeric(),
validate_outcomes_are_univariate(),
validate_prediction_size(),
validate_predictors_are_numeric()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | # ---------------------------------------------------------------------------
original_names <- colnames(mtcars)
test <- mtcars
bad_test <- test[, -c(3, 4)]
# All good
check_column_names(test, original_names)
# Missing 2 columns
check_column_names(bad_test, original_names)
# Will error
try(validate_column_names(bad_test, original_names))
# ---------------------------------------------------------------------------
# Special error when `.outcome` is missing
train <- iris[1:100,]
test <- iris[101:150,]
train_x <- subset(train, select = -Species)
train_y <- train$Species
# Here, y is a vector
processed <- mold(train_x, train_y)
# So the default column name is `".outcome"`
processed$outcomes
# It doesn't affect forge() normally
forge(test, processed$blueprint)
# But if the outcome is requested, and `".outcome"`
# is not present in `new_data`, an error is thrown
# with very specific instructions
try(forge(test, processed$blueprint, outcomes = TRUE))
# To get this to work, just create an .outcome column in new_data
test$.outcome <- test$Species
forge(test, processed$blueprint, outcomes = TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.