Description Usage Arguments Details Value References Examples
View source: R/makeDataReport.R
Make a data overview report that summarizes the contents of a dataset and flags potential problems. The potential problems are identified by running a set of class-specific validation checks, so that different checks are performed on different variables types. The checking steps can be customized according to user input and/or data type of the inputted variable. The checks are saved to an R markdown file which can rendered into an easy-to-read data report in pdf, html or word formats. This report also includes summaries and visualizations of each variable in the dataset.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | makeDataReport(
data,
output = NULL,
render = TRUE,
useVar = NULL,
ordering = c("asIs", "alphabetical"),
onlyProblematic = FALSE,
labelled_as = c("factor"),
mode = c("summarize", "visualize", "check"),
smartNum = TRUE,
preChecks = c("isKey", "isSingular", "isSupported"),
file = NULL,
replace = FALSE,
vol = "",
standAlone = TRUE,
twoCol = TRUE,
quiet = TRUE,
openResult = TRUE,
summaries = setSummaries(),
visuals = setVisuals(),
checks = setChecks(),
listChecks = TRUE,
maxProbVals = 10,
maxDecimals = 2,
addSummaryTable = TRUE,
codebook = FALSE,
reportTitle = NULL,
treatXasY = NULL,
includeVariableList = TRUE,
...
)
|
data |
The dataset to be checked. This dataset should be of class |
output |
Output format. Options are |
render |
Should the output file be rendered (defaults to |
useVar |
Variables to describe in the report.
If |
ordering |
Choose the ordering of the variables in the variable presentation. The options are "asIs" (ordering as in the dataset) and "alphabetical" (alphabetical order). |
onlyProblematic |
A logical. If |
labelled_as |
A string explaining the way to handle labelled and haven_labelled vectors.
Currently |
mode |
Vector of tasks to perform among the three categories "summarize", "visualize" and "check".
The default, |
smartNum |
If |
preChecks |
Vector of function names for check functions used in the pre-check stage. The pre-check stage consists of variable checks that should be performed before the summary/visualization/checking step. If any of these checks find problems, the variable will not be summarized nor visualized nor checked. |
file |
The filename of the outputted rmarkdown (.Rmd) file.
If set to |
replace |
If |
vol |
Extra text string or numeric that is appended on the end of the output
file name(s). For example, if the dataset is called "myData", no file argument is
supplied and |
standAlone |
A logical. If |
twoCol |
A logical. Should the results from the summarize and visualize
steps be presented in two columns? Defaults to |
quiet |
A logical. If |
openResult |
A logical. If |
summaries |
A list of summaries to use on each supported variable type. We recommend
using |
visuals |
A list of visual functions to use on each supported variable type. We recommend
using |
checks |
A list of checks to use on each supported variable type. We recommend
using |
listChecks |
A logical. Controls whether what checks that were used for each
possible variable type are summarized in the output. Defaults to |
maxProbVals |
A positive integer or |
maxDecimals |
A positive integer or |
addSummaryTable |
A logical. If |
codebook |
A logical. Defaults to |
reportTitle |
A text string. If supplied, this will be the printed title of the report. If left unspecified, the title with the name of the supplied dataset. |
treatXasY |
A list that indicates how non-standard variable classes should be treated.
This parameter allows you to include variables that are not of class |
includeVariableList |
A logical indicating whether the results of the summarize/visualize/check-steps
should be added to the report. Defaults to |
... |
Other arguments that are passed on the to precheck, checking, summary and visualization functions. |
For each variable, a set of pre-check functions (controlled by the
preChecks
argument) are first run and then then a battery of
functions are applied depending on the variable class. For each
variable type the summarize/visualize/check functions are applied
and and the results are written to an R markdown file.
The function does not return anything. Its side effect (the production of a data report) is the reason for running the function.
Petersen AH, Ekstrøm CT (2019). “dataMaid: Your Assistant for Documenting Supervised Data Quality Screening in R.” _Journal of Statistical Software_, *90*(6), 1-38. doi: 10.18637/jss.v090.i06 ( doi: 10.18637/jss.v090.i06).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | data(testData)
data(toyData)
check(toyData)
## Not run:
DF <- data.frame(x = 1:15)
makeDataReport(DF)
## End(Not run)
## Not run:
data(testData)
makeDataReport(testData)
## End(Not run)
# Overwrite any existing files generated by makeDataReport
## Not run:
makeDataReport(testData, replace=TRUE)
## End(Not run)
# Change output format to Word/docx:
## Not run:
makeDataReport(testData, replace=TRUE, output = "word")
## End(Not run)
# Only include problematic variables in the output document
## Not run:
makeDataReport(testData, replace=TRUE, onlyProblematic=TRUE)
## End(Not run)
# Add user defined check-function to the checks performed on character variables:
# Here we add functionality to search for the string wally (ignoring case)
## Not run:
wheresWally <- function(v, ...) {
res <- grepl("wally", v, ignore.case=TRUE)
problem <- any(res)
message <- "Wally was found in these data"
checkResult(list(problem = problem,
message = message,
problemValues = v[res]))
}
wheresWally <- checkFunction(wheresWally,
description = "Search for the string 'wally' ignoring case",
classes = c("character")
)
# Add the newly defined function to the list of checks used for characters.
makeDataReport(testData,
checks = setChecks(character = defaultCharacterChecks(with = "wheresWally")),
replace=TRUE)
## End(Not run)
#Handle non-supported variable classes using treatXasY: treat raw as character and
#treat complex as numeric. We also add a list variable, but as lists are not
#handled through treatXasY, this variable will be caught in the preChecks and skipped:
## Not run:
toyData$rawVar <- as.raw(c(1:14, 1))
toyData$compVar <- c(1:14, 1) + 2i
toyData$listVar <- as.list(c(1:14, 1))
makeDataReport(toyData, replace = TRUE,
treatXasY = list(raw = "character", complex = "numeric"))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.