In ellieallien/cleaninginspectoR: Basic checks that data cleaning ocurred

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Les fonctions particulieres au nettoyage

Qu'est-ce que le "all" dans inspect_all ?

inspect_all() est compose de plusieurs checks de nettoyage de donnéées. Ceux-ci peuvent etre accedes directement, comme démontré ci-bas

Load the cleaninginspectoR Package

library("cleaninginspectoR")
library("knitr")

Example data frame

Here we create some fake data for illustration purposes. It is not important to understand this; we keep it in so you can run the example yourself if you like. The dataset contains:

variable a: random values and outliers
variable uuid: values should be unique but are not
variable water.source.other: all NA except for two
variable GPS.lat just some numbers, but the column header indicates this is potentially sensitive

testdf <- data.frame(a= c(runif(98),7287,-100),
                   b=sample(letters,100,T),
                   uuid=c(1:98, 4,20),
                   water.source.other = c(rep(NA,98),"neighbour's well","neighbour's well"),
                   GPS.lat = runif(100)
                   )

Finding duplicates in certain columns

There is a generic function to find duplicates in a certain specified column:

find_duplicates(testdf, duplicate.column.name = "uuid")

knitr::kable(find_duplicates(testdf, duplicate.column.name = "uuid"))

Often this is used on a column with UUID's, so there is a wrapper that looks for "uuid" in the column names and returns duplicates in the first matching column it finds. This gives the same result as the above:

find_duplicates_uuid(testdf)

knitr::kable(find_duplicates_uuid(testdf))

run ?find_duplicates or ?find_duplicates_uuid for details.

Checking for outliers

find_outliers(testdf)

knitr::kable(find_outliers(testdf))

Run ?find_outliers for details

Checking for other responses

find_other_responses(testdf)

knitr::kable(find_other_responses(testdf))

Run ?find_other_responses for details

ellieallien/cleaninginspectoR documentation built on July 18, 2019, 12:30 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ellieallien/cleaninginspectoR
Basic checks that data cleaning ocurred

In ellieallien/cleaninginspectoR: Basic checks that data cleaning ocurred

Les fonctions particulieres au nettoyage

Qu'est-ce que le "all" dans inspect_all ?

Load the cleaninginspectoR Package

Example data frame

Finding duplicates in certain columns

Checking for outliers

Checking for other responses

R Package Documentation

Browse R Packages

We want your feedback!

ellieallien/cleaninginspectoR Basic checks that data cleaning ocurred

In ellieallien/cleaninginspectoR: Basic checks that data cleaning ocurred

Les fonctions particulieres au nettoyage

Qu'est-ce que le "all" dans inspect_all ?

Load the cleaninginspectoR Package

Example data frame

Finding duplicates in certain columns

Checking for outliers

Checking for other responses

R Package Documentation

Browse R Packages

We want your feedback!

ellieallien/cleaninginspectoR
Basic checks that data cleaning ocurred