logicCheck: Checking for logical consistency between two categorical...

Description Usage Arguments Details Value Examples

View source: R/logicCheck.R

Description

This function will check for logical consistency between two categorical variables in a fully or partially synthesized data set.

Usage

1
logicCheck(obs_data, new_data, vars, NAopt = T)

Arguments

obs_data

The original data set to which the next will be compared, of the type "data.frame".

new_data

The fully or partially synthetic data set to be compared to the observed data, of the type "data.frame".

vars

A vector of two categorical variables in the data sets to check for logical consistency.

NAopt

Defaults to TRUE to use NAs in tables. If you do not wish to check for NAs, put FALSE.

Details

When a data set is fully or partially synthesized from an observed data set, sometimes there are logical consistencies in the observed data set which must be adhered to in the synthesized data set that may be violated during the course of the synthesis. For example, if there is a data set which contains an age variable and a variable that represents whether or not a person has a drivers license in the state of Pennsylvania, the age variable should indicate that the person is at least 16-years-old if the license indicator shows that the person has a drivers license. It is recommended that you check for data comparability with dataComp() prior to using this function.

This function creates cross-tabulations of the specified variables of both the observed data set and synthesized data set, then checks that the corresponding cell values are either zero or a positive value accordingly. It was developed with the intention of making the job of researching synthetic data utility a bit easier by quickly checking for logical consistency.

Value

This function returns a message stating whether or not there were any potential logical inconsistencies found in the data sets for the variables specified. Then the cross-tabulations will be printed (in either case) for the analyst to review.

This function will also return a list of the following components:

consistent

A logical value indicating whether the variable cross-tabulation is logically consistent.

obs.table

The original data set cross-tabulation.

new.table

The new data set cross-tabulation.

which

A matrix indicating if values are logically consistent. 0=consistent, otherwise=inconsistent.

Examples

1
2
3
4
#PPA is observed data set, PPAps2 is a partially synthetic data set derived from the observed data.
#age17plus and marriage are two categorical variables within these data sets.

logicCheck(PPA, PPAps2, c("age17plus", "marriage"))

RTIInternational/SynthTools documentation built on Oct. 30, 2019, 10:50 p.m.