df_get_psi_score: Calculate the feature-level PSI index for the specified...

View source: R/df_get_psi_score.R

df_get_psi_scoreR Documentation

Calculate the feature-level PSI index for the specified numeric features

Description

This function takes two Spark DataFrames as input. One should contain the features with their expected values. The other should contain the features with their actual values. Example... if we're comparing Oct '18 to Nov '18 features, Oct '18 would be expected and Nov '18 would be actual.

Usage

df_get_psi_score(expected_, actual_, features_)

Arguments

expected_

Required: A matrix containing features with the expected (old) data.

actual_

Required: A matrix containing features from with the actual (new) data.

features_

Optional: A vector of the feature names to validate. Note, the feature names must exist in both sdf_expected and sdf_actual and be of the same data type in each data frame. If not features are provided, all features in sdf_expected will be used.

Details

NOTE: This function currently only supports NUMERIC and/or CHARACTER data types. If you have other types of data, please filter them out before passing to the function.

Value

A matrix containing the feature name, bin, min value, max value, expected count, expected


BrandonRCopeland/DataScience documentation built on Oct. 14, 2023, 9:45 a.m.