diagnose: Diagnose the random split

View source: R/diagnose.R

diagnoseR Documentation

Diagnose the random split

Description

Diagnose the random split

Usage

diagnose(
  dataset.name,
  df.train,
  df.test,
  model.relation = "",
  metric.performance = "Normalized AIC",
  num.simulations = 200,
  alpha = 0.05,
  save.plots = TRUE,
  output.dir = "Output"
)

Arguments

dataset.name

Name of the Dataset (String)

df.train

Train Partition (R DataFrame)

df.test

Test Partition (R DataFrame)

model.relation

The relation used for regression model

metric.performance

The performance metric, usually Normalized AIC

num.simulations

Number of simulations, defaults to 200

alpha

The level of the test for visualize_threshold, default set to 0.05

save.plots

Saves plots in output.dir when set to TRUE

output.dir

The path to output directory the plots are saved to

Value

The following three plots are plotted:

Examples


# ------------------------- Example 1 ------------------------------
# data preparation
dataset.name <- "Abalone"
data(abalone)
split.percentage <- 0.8

# initial random split of data
s <- sample(x = 1:nrow(abalone), size = floor(nrow(abalone)*split.percentage), replace = F)
df.train <- abalone[s, ]
df.test <- abalone[-s, ]

# defining model relation based on variables of data
model.relation <- Rings ~ LongestShell + Diameter + Height

# function call
diagnose(dataset.name, df.train, df.test, model.relation = model.relation,
 metric.performance = "Normalized AIC", num.simulations = 200,
  alpha = 0.05, save.plots = TRUE, output.dir = "Output")

# without model relation
diagnose(dataset.name, df.train, df.test, num.simulations = 200,
  alpha = 0.05, save.plots = TRUE, output.dir = "Output")

# ------------------------- Example 2 ------------------------------

# data preparation
dataset.name <- "Diamonds"
data(diamonds)
split.percentage <- 0.8

# initial random split of data
s <- sample(x = 1:nrow(diamonds), size = floor(nrow(diamonds)*split.percentage), replace = F)
df.train <- diamonds[s, ]
df.test <- diamonds[-s, ]

# defining model relation based on variables of data
model.relation <- price ~ x:y:z + depth

# function call
diagnose(dataset.name, df.train, df.test, model.relation = model.relation,
 metric.performance = "Normalized AIC", num.simulations = 200,
  alpha = 0.05, save.plots = TRUE, output.dir = "Output")

# without model relation
diagnose(dataset.name, df.train, df.test, num.simulations = 200,
  alpha = 0.05, save.plots = TRUE, output.dir = "Output")

eklavyaj/RandomSplitDiagnostics documentation built on June 1, 2022, 8:36 p.m.