This package assesses the quality of a random split of a dataset.
The analysis is based on a modified version of the Mahalanobis distance, a multidimensional distance measuring technique.
After the user inputs an initial split along with the model relation for regression (in R format), the diagnose()
function will return our conclusion in addition to a plot displaying the foundation of our conclusions.
library(RandomSplitDiagnostics)
# data preparation
dataset_name <- "Abalone"
data(abalone)
# intial random split of data
s <- sample(x = 1:nrow(abalone), size = floor(nrow(abalone)*0.7), replace = F)
df_train <- abalone[s, ]
df_test <- abalone[-s, ]
# defining model relation based on variables of data
model.relation <- Rings ~ LongestShell + Diameter + Height
# function call
diagnose(dataset.name, df.train, df.test, model.relation = model.relation,
metric.performance = "Normalized AIC", num.simulations = 200,
alpha = 0.05, save.plots = TRUE, output.dir = "Output")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.