knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Les fonctions particulieres au nettoyage

Qu'est-ce que le "all" dans inspect_all ?

inspect_all() est compose de plusieurs checks de nettoyage de donnéées. Ceux-ci peuvent etre accedes directement, comme démontré ci-bas

Load the cleaninginspectoR Package

library("cleaninginspectoR")
library("knitr")

Example data frame

Here we create some fake data for illustration purposes. It is not important to understand this; we keep it in so you can run the example yourself if you like. The dataset contains:

testdf <- data.frame(a= c(runif(98),7287,-100),
                   b=sample(letters,100,T),
                   uuid=c(1:98, 4,20),
                   water.source.other = c(rep(NA,98),"neighbour's well","neighbour's well"),
                   GPS.lat = runif(100)
                   )

Finding duplicates in certain columns

There is a generic function to find duplicates in a certain specified column:

find_duplicates(testdf, duplicate.column.name = "uuid")
knitr::kable(find_duplicates(testdf, duplicate.column.name = "uuid"))

Often this is used on a column with UUID's, so there is a wrapper that looks for "uuid" in the column names and returns duplicates in the first matching column it finds. This gives the same result as the above:

find_duplicates_uuid(testdf)
knitr::kable(find_duplicates_uuid(testdf))

run ?find_duplicates or ?find_duplicates_uuid for details.

Checking for outliers

find_outliers(testdf)
knitr::kable(find_outliers(testdf))

Run ?find_outliers for details

Checking for other responses

find_other_responses(testdf)
knitr::kable(find_other_responses(testdf))

Run ?find_other_responses for details



ellieallien/cleaninginspectoR documentation built on July 18, 2019, 12:30 p.m.