data_check: Data quality check function

View source: R/data_check.R

data_checkR Documentation

Data quality check function

Description

Conduct data quality check including checking missingness, variation, correlation and VIF of variables.

Usage

data_check(Y, Z, ID)

Arguments

Y

a numeric vector indicating the outcome variable.

Z

a matrix or data frame representing covariates.

ID

a numeric vector representing the provider identifier.

Details

The function performs the following checks:

  • Missingness: Checks for any missing values in the dataset and provides a summary of missing data.

  • Variation: Identifies covariates with zero or near-zero variance which might affect model stability.

  • Correlation: Analyzes pairwise correlation among covariates and highlights highly correlated pairs.

  • VIF: Computes the Variable Inflation Factors to identify covariates with potential multicollinearity issues.

If issues arise when using the model functions logis_fe, linear_fe and linear_re, this function can be called for data quality checking purposes.

Value

No return value, called for side effects.

Examples

data(ExampleDataBinary)
outcome = ExampleDataBinary$Y
covar = ExampleDataBinary$Z
ID = ExampleDataBinary$ID
data_check(outcome, covar, ID)


pprof documentation built on April 12, 2025, 1:33 a.m.