df_validate: Validate two separate datasets and return validation data on...

View source: R/df_validate.R

df_validateR Documentation

Validate two separate datasets and return validation data on aggregate and id level features

Description

This function takes two datasets (actual and expected) and validates them based on how many features match exact values at the id level as well as an aggregate

Usage

df_validate(actual, expected, features, key, matchTestPct = 0.98,
  meanTestPct = 0.02)

Arguments

actual

a base R dataframe contain actual feature values

expected

a base R dataframe containing expected feature values

features

the features to be validated

key

The name of the primary key for actual and expected

matchTestPct

The percentage of exact, id level matches to pass test 1

meanTestPct

The maximimun percent difference allowed between actual and expected features

Details

NOTE: The columns to be validate must have the same name in each dataset. If categorical features are included, they will be converted to integer values so means can be established. Integer order is determined by an ascending sort of the categorical variable.

Value

A base R dataframe containing one row per feature with match and mean validation data


BrandonRCopeland/DataScience documentation built on Oct. 14, 2023, 9:45 a.m.