SplitTrainTest | R Documentation |
For modeling we usually split our data frame in a train sample, where we train our model on, and a test sample, where we test, how good it works. This function splits a given data frame in two parts, one being the training sample and the other the test sample in form of a list with two elements.
SplitTrainTest(x, p = 0.1, seed = NULL, logical = FALSE)
x |
data.frame |
p |
proportion for test sample. Default is 10%. |
seed |
initialization for random number generator. |
logical |
logical, defining if a logical vector should be returned or the list with train and test data. Default is |
In order to obtain reasonable models, we should ensure two points. The dataset must be large enough to yield statistically meaningful results and it should be representative of the data set as a whole. Assuming that our test set meets the preceding two conditions, our goal is to create a model that generalizes well to new data. We are aiming for a model that equally well predicts training and test data. We should never train on test data. If we are seeing surprisingly good results on the evaluation metrics, it might be a sign that we're accidentally training on the test set.
If logical
is FALSE
a list with two data frames, train
and test
, of the same structure as the given data in x
if logical
is TRUE
a logical vector containing nrow
elements of TRUE
and FALSE
Andri Signorell <andri@signorell.net>
SplitTrainTest(d.pima)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.