Introduction

The tidyplusR package is an essential data cleaning package with features like missing value treatment, data manipulation and displaying data as markdown table for documents. The package adds a few additional functionality on the existing data wrangling packages in popular statistical software like R. The objective of this package is to provide a few specific functions to solve some of the pressing issues in data cleaning.

Contributors:

Installation

You can install tidyplusR from github using the following:

# install.packages("devtools")
devtools::install_github("UBC-MDS/tidyplusR")

Functions included:

Three main parts include different functions in tidyplusR

Examples

This is a basic example which shows you how to solve a common problem:

Datatype cleansing

The section has two functions, typemix and cleanmix.

library(tidyplusR)


dat<-data.frame(x1=c(1,2,3,"1.2.3"),
                x2=c("test","test",1,TRUE),
                x3=c(TRUE,TRUE,FALSE,FALSE))
#Input data with mixed datatypes
dat 
#Identified and cleaned(removed) datatypes based on the types mentioned
tidyplusR::cleanmix(typemix(dat),column=c(1,2),type=c("number","character"))

Missing Value imputation

### Dummy dataframe
dat <- data.frame(x=sample(letters[1:3],20,TRUE), 
                  y=sample(letters[1:3],20,TRUE),
                  w=as.numeric(sample(0:50,20,TRUE)),
                  z=sample(letters[1:3],20,TRUE), 
                  b = as.logical(sample(0:1,20,TRUE)),
                  a=sample(0:100,20,TRUE),
                  stringsAsFactors=FALSE)

dat[c(5,10,15),1] <- NA
dat[c(3,7),2] <- NA
dat[c(1,3,5),3] <- NA
dat[c(4,5,9),4] <- NA
dat[c(4,5,9),5] <- NA
dat[,4] <- factor(dat[,4] )
dat[c(4,5,9),6] <- NA
#Input data with missing values
dat 
#### Calling impute function
#Missing value replaced with method = mode
tidyplusR::impute(dat,method = "mode")   ## method can be replaced by median and mean as well

Markdown table

## default: ncol = 2 and nrow = 2, alignment = "l"
md_new()

## 3 by 3 table
md_new(nrow = 3, ncol = 3)

## different alignments:
md_new(nrow = 1, align = "c")
md_new(nrow = 1, align = "r")

## providing header
h <- c("foo", "boo")
md_new(header = h)
md_data(mtcars, row.index = 1:3, col.index = 1:4)

## alignment to right
md_data(mtcars, row.index = 1:3, col.index = 1:4, align = "r")

## provide header
md_data(mtcars, row.index = 1:3, col.index = 1:4, header = c("a","b","c","d"))

## not include row names
md_data(mtcars, row.index = 1:3, col.index = 1:4, row.names = F)

Existing features in R ecosystem similar to tidyplusR

License

MIT

Contributing

This is an open source project. Please follow the guidelines below for contribution. - Open an issue for any feedback and suggestions. - For contributing to the project, please refer to Contributing for details.



UBC-MDS/tidyplusR documentation built on May 25, 2019, 1:36 p.m.