require(knitr) # Set so that long lines in R will be wrapped: opts_chunk$set(tidy.opts=list(width.cutoff=80),tidy=TRUE)
DS.data <- params$DS.data DD.dict <- params$DD.dict non.NA.missing.codes <- params$non.NA.missing.codes threshold <- params$threshold
missingness_summary(DS.data, non.NA.missing.codes = non.NA.missing.codes, threshold = threshold)
In the value_missing_table
function, for each variable, we have three sets of possible values:
(1) the set D of all the unique values observed in the data;
(2) the set V of all the values explicitly encoded in the VALUES columns of the data dictionary; and
(3) the set M of the missing value codes defined by the user via the 'non.NA.missing.codes' argument.
This function examines various intersections of these three sets, providing awareness checks about possible issues of concern.
results.list <- value_missing_table(DD.dict, DS.data, non.NA.missing.codes = non.NA.missing.codes) results <- results.list$report kable(results$Information$details$CheckA.AllMInD, caption = "Table Check A: List of variables for which user-defined missing value code is not present in the data.")
The above table lists the variables for which the user-defined missing value code(s) are not present in the data. These are not necessarily errors, however, as dbGaPCheckup
reads non.NA.missing.codes
as "global" missing value codes, even if a specific variable does not contain the code. For example, let's say our study uses encoded missing value codes of -9999
, and we have a variable named SEX
that is complete with no missing data, containing only encoded values of 0=male, and 1=female. In this example, SEX
would be flagged in the above variable list since it does not contain a -9999
value. In other words, this variable's presence in the above list is NOT an issue that we should be concerned about. This function is attended only to bring awareness to potential errors in your data (e.g., perhaps you knew that the sex variable was missing for 5 participants for your specific study.)
Interpretation of table column names:
--> AllMInD
: Variable-specific check result communicating if user-defined missing value code(s) are detected in the data set (FALSE=no).
--> NsetD
: Number of values (or levels) detected in the data for a specific variable.
--> NsetM
: Number of missing value codes defined.
--> NsetDAndSetM
: Number of occurrences detected in both the data set and the user-defined missing value code.
--> MNotInD
: User-defined missing value code the function checked for.
--> MInD
: Variable-specific number; user-defined missing value codes detected in the data.
kable(results$Information$details$CheckB.AllVsInD, caption = "Table Check B: List of variables for which a VALUES entry defines an encoded code value, but that value is not present in the data.")
The above table lists variables for which a VALUES entry defines an encoded value (i.e., value=meaning; e.g., 0=male), but that value is not present in the data. While ideally all defined encoded values (i.e., in set V) should be observed in the data (i.e., in set D), it is NOT necessarily an error if one does not.
Interpretation of table column names:
--> AllVsInD
: Check result communicating if all parsed VALUES entries were detected in the data set (FALSE=no).
--> NsetD
: Number of values (or levels) detected in the data for a specific variable.
--> NsetV
: Number of encoded value codes detected for a specific variable.
--> NsetDAndSetV
: Number of occurrences detected in both the data set and the VALUES entries.
--> VsNotInD
: Encoded value not detected in the data.
kable(results$Information$details$CheckC.AllSetMInSetV, caption = "Table Check C: List of variables for which user-defined missing value code(s) are not defined in a VALUES entry.")
Interpretation of table column names:
--> AllSetMInSetV
: Variable-specific check result communicating if user-defined missing value code(s) are detected as a VALUES entry (FALSE=no).
--> NsetV
: Number of encoded value codes detected for a specific variable.
--> NsetM
: Number of missing value codes defined.
--> NsetMAndSetD
: Number of occurrences detected in both the user-defined missing value code and data set.
--> SetMsNotInSetV
: Missing value code defined that was not detected in the VALUES entries.
kable(results$Information$details$CheckD.All_MInSetD_InSetV, caption = "Table Check D: List of variables for which a user-defined missing value code is present in the data for a given variable, but that variable does not have a corresponding VALUES entry.")
Interpretation of table column names:
--> All_MInSetD_InSetV
: Variable-specific check result communicating if user-defined missing value code(s) are detected in the data for a given variable, but that variable does not have a corresponding VALUES entry (FALSE=no).
--> setMInDNotInV
: Encoded value codes detected in the data but not in a corresponding VALUES entry.
kable(results$Information$details$CheckE.All_VNotInM_NotInD, caption = "Table Check E: List of variables for which a VALUES entry is NOT defined as a missing value code AND is NOT identified in the data")
Interpretation of table column names:
--> All_VNotInM_NotInD
: Variable-specific check result communicating if encoded values that are NOT defined as a missing value code are detected in the data (FALSE=no).
--> setVNotInM_NotInD
: Encoded value codes detected as a VALUES entry but NOT listed as a missing value code and NOT detected in the data.
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.