splitratio: Optimal splitting ratio

View source: R/SPlit.R

splitratioR Documentation

Optimal splitting ratio

Description

splitratio() finds the optimal splitting ratio by assuming a polynomial regression model with interactions can approximate the true model. The number of parameters in the model is estimated from the full data using stepwise regression. A simpler solution is to choose the number of parameters to be square root of the number of unique rows in the input matrix of the dataset. Please see Joseph (2022) for details.

Usage

splitratio(x, y, method = "simple", degree = 2)

Arguments

x

Input matrix

y

Response (output variable)

method

This could be “simple” or “regression”. The default method “simple” uses the square root of the number of unique rows in x as the number of parameters, whereas “regression” estimates the number of parameters using stepwise regression. The “regression” method works only with continuous output variable.

degree

This specifies the degree of the polynomial to be fitted, which is needed only if method=“regression” is used. Default is 2.

Value

Splitting ratio, which is the fraction of the dataset to be used for testing.

References

Joseph, V. R. (2022). Optimal Ratio for Data Splitting. Statistical Analysis & Data Mining: The ASA Data Science Journal, to appear.

Examples

X = rnorm(n=100, mean=0, sd=1) 
Y = rnorm(n=100, mean=X^2, sd=1)
splitratio(x=X, y=Y)
splitratio(x=X, y=Y, method="regression")


SPlit documentation built on March 22, 2022, 9:06 a.m.