PrInDTMulev | R Documentation |
PrInDT analysis for a classification problem with more than 2 classes. For each combination of one class vs.
the other classes a 2-class PrInDT
analysis is carried out.
The percentages for undersampling of the larger class ('percl' in PrInDT
) are chosen so that the resulting sizes
are comparable with the size of the smaller classes for which all their observations are used in undersampling ('percs' = 1 in PrInDT
).
The class with the highest probability in the K (= number of classes) analyses is chosen for prediction.
Interpretability is checked (see 'ctestv').
The parameters 'conf.level', 'minsplit', and 'minbucket' can be used to control the size of the trees.
PrInDTMulev(datain,classname,ctestv=NA,N,conf.level=0.95,seedl=FALSE,
minsplit=NA,minbucket=NA)
datain |
Input data frame with class factor variable 'classname' and the |
classname |
Name of class variable (character) |
ctestv |
Vector of character strings of forbidden split results; |
N |
Number of repetitions (integer > 0) |
conf.level |
(1 - significance level) in function |
seedl |
Should the seed for random numbers be set (TRUE / FALSE)? |
minsplit |
Minimum number of elements in a node to be splitted; |
minbucket |
Minimum number of elements in a node; |
Standard output can be produced by means of print(name)
or just name
as well as plot(name)
where 'name' is the output data
frame of the function.
The plot function will produce a series of more than one plot. If you use R, you might want to specify windows(record=TRUE)
before
plot(name)
to save the whole series of plots. In R-Studio this functionality is provided automatically.
levels of class variable
trees for the levels of the class variable; refer to an individual tree as trees[[k]]
, k = 1, ..., no. of levels
balanced accuracy of combined predictions
confusion matrix of combined predictions
no. of non-interpretable trees
balanced accuracies of best models for individual classes
datastrat <- PrInDT::data_zero
data <- na.omit(datastrat)
ctestv <- NA
data$rel[data$ETH %in% c("C1a","C1b","C1c") & data$real == "zero"] <- "zero1"
data$rel[data$ETH %in% c("C2a","C2b","C2c") & data$real == "zero"] <- "zero2"
data$rel[data$real == "realized"] <- "real"
data$rel <- as.factor(data$rel) # rel is new class variable
data$real <- NULL # remove old class variable
N <- 51
conf.level <- 0.99 # 1 - significance level (mincriterion) in ctree
out <- PrInDTMulev(data,"rel",ctestv,N,conf.level)
out # print best models based on subsamples
plot(out) # corresponding plots
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.