View source: R/data_integrity.R
data_integrity_model | R Documentation |
Given a data frame, we need to create models (xgboost, random forest, regression, etc). Each one of them has its constraints regarding data types. Many errors appear when we are creating models just because of data format.
This function returns, given a certain model, which are the constraints that the data is not satisfying. This way we can anticipate and correct errors before we call for model creation. This function is quite related to data_integrity
.
data_integrity_model(data, model_name, MAX_UNIQUE = 35)
data |
data frame or a single vector |
model_name |
model name, you can check all the available models by printing 'metadata_models' data frame. |
MAX_UNIQUE |
max unique threshold to flag a categorical variable as a high cardinality one. Normally above 35 values it is needed to reduce the number of different values. # Example 1: data_integrity_model(data=heart_disease, model_name="pca") # Example 2: # changing the default minimum threshold to flag a variable as high cardiniality data_integrity_model(data=iris, model_name="xgboost", MAX_UNIQUE=50) |
an 'integritymodel' object
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.