validate_column_names | R Documentation |
data
contains required column namesvalidate - asserts the following:
The column names of data
must contain all original_names
.
check - returns the following:
ok
A logical. Does the check pass?
missing_names
A character vector. The missing column names.
validate_column_names(data, original_names, ..., call = current_env())
check_column_names(data, original_names)
data |
A data frame to check. |
original_names |
A character vector. The original column names. |
... |
These dots are for future extensions and must be empty. |
call |
The call used for errors and warnings. |
A special error is thrown if the missing column is named ".outcome"
. This
only happens in the case where mold()
is called using the xy-method, and
a vector y
value is supplied rather than a data frame or matrix. In that
case, y
is coerced to a data frame, and the automatic name ".outcome"
is
added, and this is what is looked for in forge()
. If this happens, and the
user tries to request outcomes using forge(..., outcomes = TRUE)
but
the supplied new_data
does not contain the required ".outcome"
column,
a special error is thrown telling them what to do. See the examples!
validate_column_names()
returns data
invisibly.
check_column_names()
returns a named list of two components,
ok
, and missing_names
.
hardhat provides validation functions at two levels.
check_*()
: check a condition, and return a list. The list
always contains at least one element, ok
, a logical that specifies if the
check passed. Each check also has check specific elements in the returned
list that can be used to construct meaningful error messages.
validate_*()
: check a condition, and error if it does not pass. These
functions call their corresponding check function, and
then provide a default error message. If you, as a developer, want a
different error message, then call the check_*()
function yourself,
and provide your own validation function.
Other validation functions:
validate_no_formula_duplication()
,
validate_outcomes_are_binary()
,
validate_outcomes_are_factors()
,
validate_outcomes_are_numeric()
,
validate_outcomes_are_univariate()
,
validate_prediction_size()
,
validate_predictors_are_numeric()
# ---------------------------------------------------------------------------
original_names <- colnames(mtcars)
test <- mtcars
bad_test <- test[, -c(3, 4)]
# All good
check_column_names(test, original_names)
# Missing 2 columns
check_column_names(bad_test, original_names)
# Will error
try(validate_column_names(bad_test, original_names))
# ---------------------------------------------------------------------------
# Special error when `.outcome` is missing
train <- iris[1:100, ]
test <- iris[101:150, ]
train_x <- subset(train, select = -Species)
train_y <- train$Species
# Here, y is a vector
processed <- mold(train_x, train_y)
# So the default column name is `".outcome"`
processed$outcomes
# It doesn't affect forge() normally
forge(test, processed$blueprint)
# But if the outcome is requested, and `".outcome"`
# is not present in `new_data`, an error is thrown
# with very specific instructions
try(forge(test, processed$blueprint, outcomes = TRUE))
# To get this to work, just create an .outcome column in new_data
test$.outcome <- test$Species
forge(test, processed$blueprint, outcomes = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.