knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
inspect_all()
est compose de plusieurs checks de nettoyage de donnéées.
Ceux-ci peuvent etre accedes directement, comme démontré ci-bas
library("cleaninginspectoR") library("knitr")
Here we create some fake data for illustration purposes. It is not important to understand this; we keep it in so you can run the example yourself if you like. The dataset contains:
a
: random values and outliersuuid
: values should be unique but are notwater.source.other
: all NA except for twoGPS.lat
just some numbers, but the column header indicates this is potentially sensitivetestdf <- data.frame(a= c(runif(98),7287,-100), b=sample(letters,100,T), uuid=c(1:98, 4,20), water.source.other = c(rep(NA,98),"neighbour's well","neighbour's well"), GPS.lat = runif(100) )
There is a generic function to find duplicates in a certain specified column:
find_duplicates(testdf, duplicate.column.name = "uuid")
knitr::kable(find_duplicates(testdf, duplicate.column.name = "uuid"))
Often this is used on a column with UUID's, so there is a wrapper that looks for "uuid" in the column names and returns duplicates in the first matching column it finds. This gives the same result as the above:
find_duplicates_uuid(testdf)
knitr::kable(find_duplicates_uuid(testdf))
run ?find_duplicates
or ?find_duplicates_uuid
for details.
find_outliers(testdf)
knitr::kable(find_outliers(testdf))
Run ?find_outliers
for details
find_other_responses(testdf)
knitr::kable(find_other_responses(testdf))
Run ?find_other_responses
for details
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.