PrInDTAll | R Documentation |
ctree based on all observations in 'datain'. Interpretability is checked (see 'ctestv'); probability threshold can be specified. The parameters 'conf.level', 'minsplit', and 'minbucket' can be used to control the size of the trees.
Reference
Weihs, C., Buschfeld, S. 2021a. Combining Prediction and Interpretation in Decision Trees (PrInDT) -
a Linguistic Example. arXiv:2103.02336
In the case of repeated measurements ('indrep=1'), the values of the substructure variable have to be given in 'repvar'. Only one value of 'classname' is allowed for each value of 'repvar'. If for a value of 'repvar' the percentage 'thr' of the observed occurence of a value of 'classname' is not reached by the number of predictions of the value of 'classname', a misclassification is detected.
PrInDTAll(datain, classname, ctestv=NA, conf.level=0.95, thres=0.5,
minsplit=NA,minbucket=NA,repvar=NA,indrep=0,thr=0.5)
datain |
Input data frame with class factor variable 'classname' and the |
classname |
Name of class variable (character) |
ctestv |
Vector of character strings of forbidden split results; |
conf.level |
(1 - significance level) in function |
thres |
Probability threshold for prediction of smaller class (numerical, >= 0 and < 1); default = 0.5 |
minsplit |
Minimum number of elements in a node to be splitted; |
minbucket |
Minimum number of elements in a node; |
repvar |
Values of variable defining the substructure in the case of repeated measurements, length = dim(datain)[1] necessary; default=NA |
indrep |
Indicator of repeated measurements ('indrep=1'); default = 0 |
thr |
threshold for element classification: minimum percentage of correct class entries; default = 0.5 |
Standard output can be produced by means of print(name)
or just name
as well as plot(name)
where 'name' is the output data
frame of the function.
ctree based on all observations
balanced accuracy of 'treeall'
criterion of interpretability of 'treeall' (TRUE / FALSE)
confusion matrix of 'treeall'
Accuracy of full sample tree on Elements of large class
Accuracy of full sample tree on Elements of small class
balanced accuracy of full sample tree on Elements
Names of misclassified Elements by full sample tree of large class
Names of misclassified Elements by full sample tree of small class
Label of large class
Label of small class
Threshold for repeated measurements
datastrat <- PrInDT::data_zero
data <- na.omit(datastrat)
ctestv <- rbind('ETH == {C2a,C1a}','MLU == {1, 3}')
conf.level <- 0.99 # 1 - significance level (mincriterion) in ctree
outAll <- PrInDTAll(data,"real",ctestv,conf.level)
print(outAll) # print model based on all observations
plot(outAll) # plot model based on all observations
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.