knitr::opts_chunk$set(echo = TRUE) library(epiuf)
The data check suite: core functions to extract each element of a data check, and wrapper functions to generate outputs in each format we use. Description of each function (not all created yet)
countIf() : the count of records matching a given condition
listIf() : the list of specified variable (usually Id) for records matching a given condition. If no condition given lists whole dataset. Can specify to remove NAs. 4 outputs possible: 1) vector of Ids eg c(1, 2, 3, 4) 2) string of Ids eg "1, 2, 3, 4" 3) vector of Ids with comparison to second dataset and repeats Ids listed separately from new eg list(repeat = c(1,2), new = c(3,4)) 4) string of Ids with comparison to second dataset with repeats highlighted eg "[1, 2 ] 3, 4"
printIf() : output for bullet point in rmd. Prints given caption, the count of records and list of Ids in string format if under a specified threshold. 1 or 2 data set entry possible. NB. currently called printIf2 so as not to interfere with the current printIf(), which it will replace.
printKableIf() : output for kable in rmd. Prints given caption, the count of records and list of Ids in string format if under a specified threshold for group of input conditions and captions into a table with a header. 1 or 2 data set entry possible. Replacement for printDataCheck().
exportTableIf() : output for excel table. Prints given caption, the count of records and list of Ids in vector format (ie ID section has number of columns to match threshold given) if under a specified threshold for multiple groups of input conditions and captions with header each into a data frame ready for insertion to excel table. 1 or 2 data set entry possible.
Attention: Must specify each parameter.
Issues:
Trouble shooting listIf(). This is the most important and most complicated of the functions!
Use of subset and lazy processing causing difficulty in wrapper functions.
df1 <- data.frame(Id = c(1,2,3,4,5,6,7,8,9,10, NA_real_) , A = c("a", "b", "b", "d", "b","a", "b", "b", "d", "e", "b") , B = c("", "dog", "cat", "rabbit", "mole", "dog", "horse", "cat", "", NA_character_, "mouse") ) df2 <- data.frame(Id = c(6,7,8,9,10, 11,12,13,14,15, NA_real_) , A = c("a", "b", "b", "d", "b","f", "b", "b", "d", "e", "b") , B = c("dog", "horse", "cat", "", "mouse","mole", "antelope", "antelope", "hamster", "", "mouse") )
countIf(data=df1, cond = A=="b") countIf(df1, A=="b")
listIf(data=df1, varname="Id", cond = A=="b") listIf(data=df1, varname="Id", cond = A=="b", na.rm=TRUE) listIf(data=df1, varname="Id", cond = A=="b") listIf(df1, "Idcheck", A=="b") listIf(df1, "Idcheck") listIf(df1, "Id", collapse=TRUE, na.rm=TRUE)
# printIf2(data=df1, cond = A=="b", text = "B's in df1 only", varname="Id") # output is 7 and 2 # printIf2(data=df1, data_old=df2, cond = A=="b",text = "B's in df1 and df2", varname = "Id") # output is 7 in both, 2 in df1 only
dataCheck(data=df1, cond = A=="b"&B=="mouse", text = "B's and mice in df1 only", varname="Id") # output should be null dataCheck(data=df1, data_old=df2, cond = A=="b"&B=="mouse", text = "B's and mice in df1 and df2", varname = "Id") # output should be null dataCheck(data=df1, data_old=df2, cond = A=="b"&B=="horse", text = "B's and horses in df1 and df2", varname = "Id") # output should be 7 from both.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.