knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "tools/README-"
)

xray

Travis-CI Build Status CRAN_Status_Badge CRAN_Downloads_Badge

The R Package to have X Ray vision on your datasets. This package lets you analyze the variables of a dataset, to evaluate how is your data shaped. Consider this the first step when you have your data for modeling, you can use this package to analyze all variables and check if there is anything weird worth transforming or even avoiding the variable altogether.

Installation

You can install the stable version of xray from CRAN with:

install.packages("xray")

Or the latest dev version from Github:

# install.packages("devtools")
devtools::install_github("sicarul/xray")

Usage

Anomaly detection

xray::anomalies analyzes all your columns for anomalies, whether they are NAs, Zeroes, Infinite, etc, and warns you if it detects variables with at least 80% of rows with those anomalies. It also warns you when all rows have the same value.

Example:

data(longley)
badLongley=longley
badLongley$GNP=NA
xray::anomalies(badLongley)

Distributions

xray::distributions tries to analyze the distribution of your variables, so you can understand how each variable is statistically structured. It also returns a percentiles table of numeric variables as a result, which can inform you of the shape of the data.

distrLongley=longley
distrLongley$testCategorical=c(rep('One',7), rep('Two', 9))
xray::distributions(distrLongley)

Distributions along a time axis

xray::timebased also investigates into your distributions, but shows you the change over time, so if there is any change in the distribution over time (For example a variable stops or starts being collected) you can easily visualize it.

dateLongley=longley
dateLongley$Year=as.Date(paste0(dateLongley$Year,'-01-01'))
dateLongley$Data='Original'
ndateLongley=dateLongley
ndateLongley$GNP=dateLongley$GNP+10
ndateLongley$Data='Offseted'
xray::timebased(rbind(dateLongley, ndateLongley), 'Year')


sicarul/xray documentation built on May 29, 2019, 12:16 p.m.