knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

DESCRIPTION

The goal of nacleanR is to provide functions that aide in data cleansing. It comprises of functions to
Probe the percentage of missing values within the variables
Find valid & invalid variables from point of view of percentage of missing data
Remove Variables with Missing Values above user defined limit from a dataset
Calculate age variable in years from an existing calendar year variable in dataset by subtracting year variable from System Date

Installation

Update:Package is currently not available on CRAN
Please use GitHub to install development version
You can install the released version of nacleanr from CRAN with:

install.packages("nacleanR")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("anandjage/nacleanR")

Example perecnt_na(dataset)

This is a basic example which shows how to find percentage of missing values in the form of NA in each variable:

library(nacleanR)
csv <- system.file("extdata", 'nadata.csv', package = "nacleanR")
sample_data <- read_data(path = csv)
percent_na(sample_data)

Example invalidcols(dataset,threshold)

This is a basic example which shows variables that contain missing values NA above user defined threshold

library(nacleanR)
invalidcols(data = sample_data,threshold = 50)

Example validcols(dataset,threshold)

This is a basic example which shows variables that contain missing values NA in the form within the user defined threshold.

library(nacleanR)
validcols(data = sample_data,threshold = 50)

Example select_cols(dataset,threshold)

This is a basic example which returns dataset after removing variables that contain missing values above the user defined threshold.

library(nacleanR)
new_data <- select_cols(data = sample_data,threshold = 50)
new_data

Example age_cal(dataset,variable)

Calculates age by subtracting a year vector variable from current system year. Creates a new vector in dataset.

library(nacleanR)
csv = system.file("extdata", 'agedata.csv', package = "nacleanR")
agedata <- read_data(csv)
agedata$ageTodaySinceBuilt <- age_cal(agedata,"YearBuilt")
agedata$ageTodaySinceRenovated <- age_cal(agedata, "YearRenovated")
head(agedata)


anandjage/nacleanr documentation built on July 13, 2019, 3:06 a.m.