data_integrity_model: Check data integrity model
In funModeling: Exploratory Data Analysis and Data Preparation Tool-Box

data_integrity_model

R Documentation

Check data integrity model

Description

Given a data frame, we need to create models (xgboost, random forest, regression, etc). Each one of them has its constraints regarding data types. Many errors appear when we are creating models just because of data format. This function returns, given a certain model, which are the constraints that the data is not satisfying. This way we can anticipate and correct errors before we call for model creation. This function is quite related to data_integrity.

Usage

data_integrity_model(data, model_name, MAX_UNIQUE = 35)

Arguments

`data`	data frame or a single vector
`model_name`	model name, you can check all the available models by printing 'metadata_models' data frame.
`MAX_UNIQUE`	max unique threshold to flag a categorical variable as a high cardinality one. Normally above 35 values it is needed to reduce the number of different values. # Example 1: data_integrity_model(data=heart_disease, model_name="pca") # Example 2: # changing the default minimum threshold to flag a variable as high cardiniality data_integrity_model(data=iris, model_name="xgboost", MAX_UNIQUE=50)