In anandjage/nacleanr: Easy to use functions that aide in data cleansing

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

nacleanR

DESCRIPTION

The goal of nacleanR is to provide functions that aide in data cleansing. It comprises of functions to
Probe the percentage of missing values within the variables
Find valid & invalid variables from point of view of percentage of missing data
Remove Variables with Missing Values above user defined limit from a dataset
Calculate age variable in years from an existing calendar year variable in dataset by subtracting year variable from System Date

Installation

Update:Package is currently not available on CRAN
Please use GitHub to install development version
You can install the released version of nacleanr from CRAN with:

install.packages("nacleanR")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("anandjage/nacleanR")

Example `perecnt_na(dataset)`

This is a basic example which shows how to find percentage of missing values in the form of NA in each variable:

library(nacleanR)
## basic example code
csv <- system.file("extdata", 'nadata.csv', package = "nacleanR")
sample_data <- read_data(path = csv)
percent_na(sample_data)

Example `invalidcols(dataset,threshold)`

This is a basic example which shows variables that contain missing values NA above user defined threshold

nacleanR::invalidcols(data = sample_data,threshold = 50)

Example `validcols(dataset,threshold)`

This is a basic example which shows variables that contain missing values NA in the form within the user defined threshold.

nacleanR::validcols(data = sample_data,threshold = 50)

Example `select_cols(dataset,threshold)`

This is a basic example which returns dataset after removing variables that contain missing values above the user defined threshold.

new_data <- nacleanR::select_cols(data = sample_data,threshold = 50)
new_data

Example `age_cal(dataset,variable)`

Calculates age by subtracting a year vector variable from current system year. Creates a new vector in dataset.

csv = system.file("extdata", 'agedata.csv', package = "nacleanR")
agedata <- read_data(csv)
agedata$ageTodaySinceBuilt <- age_cal(agedata,"YearBuilt")
agedata$ageTodaySinceRenovated <- age_cal(agedata, "YearRenovated")
head(agedata)