Description Usage Arguments Value Note References See Also Examples
This method creates multi-label dataset for train, test, validation or other
proposes the partition method defined in method
. The number of
partitions is defined in partitions
parameter. Each instance is used
in only one partition of division.
1 2 3 4 5 | create_holdout_partition(
mdata,
partitions = c(train = 0.7, test = 0.3),
method = c("random", "iterative", "stratified")
)
|
mdata |
A mldr dataset. |
partitions |
A list of percentages or a single value. The sum of all
values does not be greater than 1. If a single value is informed then the
complement of them is applied to generated the second partition. If two or
more values are informed and the sum of them is lower than 1 the partitions
will be generated with the informed proportion. If partitions have names,
they are used to name the return. (Default: |
method |
The method to split the data. The default methods are:
You can also create your own partition method. See the note and example sections to more details. (Default: "random") |
A list with at least two datasets sampled as specified in partitions parameter.
To create your own split method, you need to build a function that receive a mldr object and a list with the proportions of examples in each fold and return an other list with the index of the elements for each fold.
Sechidis, K., Tsoumakas, G., & Vlahavas, I. (2011). On the stratification of multi-label data. In Proceedings of the Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD (pp. 145-158).
Other sampling:
create_kfold_partition()
,
create_random_subset()
,
create_subset()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | dataset <- create_holdout_partition(toyml)
names(dataset)
## [1] "train" "test"
#dataset$train
#dataset$test
dataset <- create_holdout_partition(toyml, c(a=0.1, b=0.2, c=0.3, d=0.4))
#' names(dataset)
#' ## [1] "a" "b" "c" "d"
sequencial_split <- function (mdata, r) {
S <- list()
amount <- trunc(r * mdata$measures$num.instances)
indexes <- c(0, cumsum(amount))
indexes[length(r)+1] <- mdata$measures$num.instances
S <- lapply(seq(length(r)), function (i) {
seq(indexes[i]+1, indexes[i+1])
})
S
}
dataset <- create_holdout_partition(toyml, method="sequencial_split")
|
Loading required package: mldr
[1] "train" "test"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.