Validates the schema of datasets containing outcome variables.

Description

The NlsyLinks handles a lot of the plumbing code needed to transform extracted NLSY datasets into a format that statistical routines can interpret. In some cases, a dataset of measured variables is needed, with one row per subject. This function validates the measured/outcome dataset, to ensure it posses an interpretable schema. For a specific list of the requirements, see Details below.

Usage

1
ValidateOutcomeDataset(dsOutcome, outcomeNames)

Arguments

dsOutcome

A data.frame with the measured variables

outcomeNames

The column names of the measure variables that eventually will be used by a statistical procedure.

Details

The dsOutcome parameter must:

  1. Have a non-missing value.

  2. Contain at least one row.

  3. Contain a column called 'SubjectTag' (case sensitive).

  4. Have the SubjectTag column containing only positive numbers.

  5. Have the SubjectTag column where all values are unique (ie, two rows/subjects cannot have the same value).

The outcomeNames parameter must:

  1. Have a non-missing value

  2. Contain only column names that are present in the dsOutcome data frame.

Value

Returns TRUE if the validation passes. Returns an error (and associated descriptive message) if it false.

Author(s)

Will Beasley

Examples

1
2
3
4
5
6
library(NlsyLinks) #Load the package into the current R session.
ds <- ExtraOutcomes79
outcomeNames <- c("MathStandardized", "WeightZGenderAge")
ValidateOutcomeDataset(dsOutcome=ds, outcomeNames=outcomeNames) #Returns TRUE.
outcomeNamesBad <- c("MathMisspelled", "WeightZGenderAge")
#ValidateOutcomeDataset(dsOutcome=ds, outcomeNames=outcomeNamesBad) #Throws error.

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.