knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "tools/README-" )
The R Package to have X Ray vision on your datasets. This package lets you analyze the variables of a dataset, to evaluate how is your data shaped. Consider this the first step when you have your data for modeling, you can use this package to analyze all variables and check if there is anything weird worth transforming or even avoiding the variable altogether.
You can install the stable version of xray from CRAN with:
install.packages("xray")
Or the latest dev version from Github:
# install.packages("devtools") devtools::install_github("sicarul/xray")
xray::anomalies
analyzes all your columns for anomalies, whether they are NAs, Zeroes, Infinite, etc, and warns you if it detects variables with at least 80% of rows with those anomalies. It also warns you when all rows have the same value.
Example:
data(longley) badLongley=longley badLongley$GNP=NA xray::anomalies(badLongley)
xray::distributions
tries to analyze the distribution of your variables, so you can understand how each variable is statistically structured. It also returns a percentiles table of numeric variables as a result, which can inform you of the shape of the data.
distrLongley=longley distrLongley$testCategorical=c(rep('One',7), rep('Two', 9)) xray::distributions(distrLongley)
xray::timebased
also investigates into your distributions, but shows you the change over time, so if there is any change in the distribution over time (For example a variable stops or starts being collected) you can easily visualize it.
dateLongley=longley dateLongley$Year=as.Date(paste0(dateLongley$Year,'-01-01')) dateLongley$Data='Original' ndateLongley=dateLongley ndateLongley$GNP=dateLongley$GNP+10 ndateLongley$Data='Offseted' xray::timebased(rbind(dateLongley, ndateLongley), 'Year')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.