df_model_monitoring_distributions: Bin and calculate feature distributions for model monitoring

View source: R/df_model_monitoring_distributions.R

df_model_monitoring_distributionsR Documentation

Bin and calculate feature distributions for model monitoring

Description

This function takes two matrices as input. One should contain the features with their expected values. The other should contain the features with their actual values. Example... if we're comparing Oct '18 to Nov '18 features, Oct '18 would be expected and Nov '18 would be actual.

Usage

df_model_monitoring_distributions(expected, actual, features)

Arguments

expected

Required: A matrix containing features with the expected (old) data.

actual

Required: A matrix containing features from with the actual (new) data.

features

Optional: A vector of the feature names to validate. Note, the feature names must exist in both expected_ and actual_ and be of the same data type in each data frame. If not features are provided, all features in expected_ will be used.

Details

NOTE: This function currently only supports NUMERIC and/or CHARACTER datatypes. Furthermore, bins with less than 5 occurances will be combined into an "other" bucket.

Value

A matrix containing the feature name, bin, min value, max value, expected count, expected


BrandonRCopeland/DataScience documentation built on Oct. 14, 2023, 9:45 a.m.