Description Usage Arguments Details Examples
partDSA is a novel tool for generating a piecewise constant estimation sieve of candidate estimators based on an intensive and comprehensive search over the entire covariate space. The strength of this algorithm is that it builds 'and' and 'or' statements. This allows combinations and substitutions of regions for the purpose of discovering intricate correlations patterns and interactions in addition to main effects. Depending on the application, this approach will supersede methods such as CART by being not only more aggressive but also more flexible. As such, partDSA provides the user an additional tool for their statistical toolbox.
1 2 3 4 5 6 7 8 9 |
partDSA(x, y, wt=rep(1, nrow(x)), x.test=x, y.test=y, wt.test,
control=DSA.control(), sleigh)
DSA.control(vfold=10, minsplit = 20, minbuck=round(minsplit/3),
cut.off.growth=10, MPD=0.1, missing="impute.at.split",
loss.function="default", wt.method="KM", brier.vec=NULL,
leafy=0, leafy.random.num.variables.per.split=4,
leafy.num.trees=50, leafy.subsample=0, save.input=FALSE,
boost=0, boost.rounds=100, cox.vec=NULL,IBS.wt=NULL, partial=NULL)
|
x |
The matrix or data frame of predictor variables for the training set, used to build the model. Each row corresponds to an observation, and each column corresponds to a variable. |
y |
The outcome (response) vector, either continuous or categorical,
representing the true response values for observations in |
wt |
Optional vector of training weights with length equal to the
number of observations in |
x.test |
The matrix or data frame of predictor variables used to
build the model.
The number of columns (variables) of |
y.test |
The outcome (response) vector, either continuous or categorical,
representing the true response values for observations in |
wt.test |
Optional vector of test weights with length equal to the
number of test set observations.
Default value is |
control |
A list object used to specify additional control parameters.
This is normally created by calling the |
sleigh |
Optional |
vfold |
The number of folds of cross-validation for the model building process. The default value is 10. |
minsplit |
The minimum number of observations in order to split a partition into two paritions. The default value is 20. |
minbuck |
The minimum number of observations in any terminal partition. The default value is round(minsplit/3). |
cut.off.growth |
The maximum number of terminal partitions to be considered when building the model. The default value is 10. |
MPD |
Minimum Percent Difference. The model fit must improve by this percentage in order to be considered. This saves time in the model building process. The default value is 0.1. |
missing |
Character string specifying how missing data should be handled. The default value is "no." See the details section from more information. |
loss.function |
The function to be minimized when building the model. For categorical outcomes, "entropy" (default) or "gini" can be specified. For continuous outcomes, the L2 loss function is used. |
wt.method |
Not documented yet. |
brier.vec |
Not documented yet. |
cox.vec |
Not documented yet. |
IBS.wt |
Not documented yet. |
leafy |
Set to 1 to run Bagged partDSA. |
leafy.random.num.variables.per.split |
Number of variables to use if utilizing random variable selection in Bagged partDSA. |
leafy.num.trees |
Numbed in trees in Bagged partDSA. Default is 50. |
leafy.subsample |
Numeric value between 0 and 1. The value 0 is used for bootstrap sampling (sampling witht replacement). If the value is greater than 0, it corresponds to the proportion of samples used to build the model, such as 0.632. The default is 0. |
save.input |
Indicates if |
boost |
Set to 1 to run Boosted partDSA. |
boost.rounds |
Maximum number of rounds of boosting. Default is 100. |
partial |
If set to "deciles," step partial importance is computed on deciles of data rather than actual data. |
missing
set to "no" indicates that there is no missing data and
will create an error if missing data is found in the dataset. Setting missing="impute.at.split" will use a data
imputation method similar to that in CRUISE (Kim and Loh, 2001). At each
split, the non-missing observations for a given variable will be used
to find the best split, and the missing observations will be imputed
based on the mean or mode (depending on whether the variable is
categorical or continuous) of the non-missing observations in that node.
Once the node assignment of these missing observations is determined
using the imputed values, the imputed values are returned to their
missing status. For missing values in the test set, the grand mean or
mode from the corresponding variables in the training set are used.
Including variables which are entirely missing will result in an error.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | library(MASS)
set.seed(6442)
n <- nrow(Boston)
tr.n <- floor(n / 2)
train.index <- sample(1:n, tr.n, replace=FALSE)
test.index <- (1:n)[-train.index]
x <- Boston[train.index, -14]
y <- Boston[train.index, 14]
x.test <- Boston[test.index, -14]
y.test <- Boston[test.index, 14]
control <- DSA.control(vfold=1) # no cross-validation
partDSA(x, y, x.test=x.test, y.test=y.test, control=control)
|
Loading required package: survival
partDSA object
# partitions test risk
1 84.588413
2 52.101930
3 42.117736
4 31.463532
5 31.105841
6 30.713062
7 29.701368
8 28.205055
9 29.123699
10 28.537634
Outcome:
Best of 1 partitions:
Part.1
22.343
Best of 2 partitions:
Part.1 Part.2
32.326 18.294
Best of 3 partitions:
Part.1 Part.2 Part.3
27.093 18.294 40.736
Best of 4 partitions:
Part.1 Part.2 Part.3 Part.4
27.093 20.971 40.736 14.279
Best of 5 partitions:
Part.1 Part.2 Part.3 Part.4 Part.5
27.093 20.971 44.278 14.279 34.36
Best of 6 partitions:
Part.1 Part.2 Part.3 Part.4 Part.5 Part.6
27.093 20.971 44.278 19.136 34.36 13.107
Best of 7 partitions:
Part.1 Part.2 Part.3 Part.4 Part.5 Part.6 Part.7
32.478 20.971 44.278 19.136 34.36 13.107 25.747
Best of 8 partitions:
Part.1 Part.2 Part.3 Part.4 Part.5 Part.6 Part.7 Part.8
32.478 22.635 44.278 19.136 34.36 13.107 25.747 19.245
Best of 9 partitions:
Part.1 Part.2 Part.3 Part.4 Part.5 Part.6 Part.7 Part.8 Part.9
32.478 21.783 44.278 19.136 34.36 13.107 25.747 19.245 28.471
Best of 10 partitions:
Part.1 Part.2 Part.3 Part.4 Part.5 Part.6 Part.7 Part.8 Part.9 Part.10
32.478 21.783 44.278 19.136 34.36 14.445 25.747 19.245 28.471 10.565
Best 2 partitions
Partition 1 [of 2]:
(lstat <= 7.560000)
Partition 2 [of 2]:
(7.560000 < lstat)
Best 3 partitions
Partition 1 [of 3]:
(rm <= 7.007000) && (lstat <= 7.560000)
Partition 2 [of 3]:
(7.560000 < lstat)
Partition 3 [of 3]:
(7.007000 < rm) && (lstat <= 7.560000)
Best 4 partitions
Partition 1 [of 4]:
(rm <= 7.007000) && (lstat <= 7.560000)
Partition 2 [of 4]:
(7.560000 < lstat <= 16.210000)
Partition 3 [of 4]:
(7.007000 < rm) && (lstat <= 7.560000)
Partition 4 [of 4]:
(16.210000 < lstat)
Best 5 partitions
Partition 1 [of 5]:
(rm <= 7.007000) && (lstat <= 7.560000)
Partition 2 [of 5]:
(7.560000 < lstat <= 16.210000)
Partition 3 [of 5]:
(7.007000 < rm) && (lstat <= 5.120000)
Partition 4 [of 5]:
(16.210000 < lstat)
Partition 5 [of 5]:
(7.007000 < rm) && (5.120000 < lstat <= 7.560000)
Best 6 partitions
Partition 1 [of 6]:
(rm <= 7.007000) && (lstat <= 7.560000)
Partition 2 [of 6]:
(7.560000 < lstat <= 16.210000)
Partition 3 [of 6]:
(7.007000 < rm) && (lstat <= 5.120000)
Partition 4 [of 6]:
(nox <= 0.581000) && (16.210000 < lstat)
Partition 5 [of 6]:
(7.007000 < rm) && (5.120000 < lstat <= 7.560000)
Partition 6 [of 6]:
(0.581000 < nox) && (16.210000 < lstat)
Best 7 partitions
Partition 1 [of 7]:
(rm <= 7.007000) && (lstat <= 4.560000)
Partition 2 [of 7]:
(7.560000 < lstat <= 16.210000)
Partition 3 [of 7]:
(7.007000 < rm) && (lstat <= 5.120000)
Partition 4 [of 7]:
(nox <= 0.581000) && (16.210000 < lstat)
Partition 5 [of 7]:
(7.007000 < rm) && (5.120000 < lstat <= 7.560000)
Partition 6 [of 7]:
(0.581000 < nox) && (16.210000 < lstat)
Partition 7 [of 7]:
(rm <= 7.007000) && (4.560000 < lstat <= 7.560000)
Best 8 partitions
Partition 1 [of 8]:
(rm <= 7.007000) && (lstat <= 4.560000)
Partition 2 [of 8]:
(7.560000 < lstat <= 11.640000)
Partition 3 [of 8]:
(7.007000 < rm) && (lstat <= 5.120000)
Partition 4 [of 8]:
(nox <= 0.581000) && (16.210000 < lstat)
Partition 5 [of 8]:
(7.007000 < rm) && (5.120000 < lstat <= 7.560000)
Partition 6 [of 8]:
(0.581000 < nox) && (16.210000 < lstat)
Partition 7 [of 8]:
(rm <= 7.007000) && (4.560000 < lstat <= 7.560000)
Partition 8 [of 8]:
(11.640000 < lstat <= 16.210000)
Best 9 partitions
Partition 1 [of 9]:
(rm <= 7.007000) && (lstat <= 4.560000)
Partition 2 [of 9]:
(rad <= 7.000000) && (7.560000 < lstat <= 11.640000)
Partition 3 [of 9]:
(7.007000 < rm) && (lstat <= 5.120000)
Partition 4 [of 9]:
(nox <= 0.581000) && (16.210000 < lstat)
Partition 5 [of 9]:
(7.007000 < rm) && (5.120000 < lstat <= 7.560000)
Partition 6 [of 9]:
(0.581000 < nox) && (16.210000 < lstat)
Partition 7 [of 9]:
(rm <= 7.007000) && (4.560000 < lstat <= 7.560000)
Partition 8 [of 9]:
(11.640000 < lstat <= 16.210000)
Partition 9 [of 9]:
(7.000000 < rad) && (7.560000 < lstat <= 11.640000)
Best 10 partitions
Partition 1 [of 10]:
(rm <= 7.007000) && (lstat <= 4.560000)
Partition 2 [of 10]:
(rad <= 7.000000) && (7.560000 < lstat <= 11.640000)
Partition 3 [of 10]:
(7.007000 < rm) && (lstat <= 5.120000)
Partition 4 [of 10]:
(nox <= 0.581000) && (16.210000 < lstat)
Partition 5 [of 10]:
(7.007000 < rm) && (5.120000 < lstat <= 7.560000)
Partition 6 [of 10]:
(crim <= 9.966540) && (0.581000 < nox) && (16.210000 < lstat)
Partition 7 [of 10]:
(rm <= 7.007000) && (4.560000 < lstat <= 7.560000)
Partition 8 [of 10]:
(11.640000 < lstat <= 16.210000)
Partition 9 [of 10]:
(7.000000 < rad) && (7.560000 < lstat <= 11.640000)
Partition 10 [of 10]:
(9.966540 < crim) && (0.581000 < nox) && (16.210000 < lstat)
Variable importance matrix:
COG=1 COG=2 COG=3 COG=4 COG=5 COG=6 COG=7 COG=8 COG=9 COG=10
crim 0 0 0 0 0 0 0 0 0 2
zn 0 0 0 0 0 0 0 0 0 0
indus 0 0 0 0 0 0 0 0 0 0
chas 0 0 0 0 0 0 0 0 0 0
nox 0 0 0 0 0 2 2 2 2 3
rm 0 0 2 2 3 3 4 4 4 4
age 0 0 0 0 0 0 0 0 0 0
dis 0 0 0 0 0 0 0 0 0 0
rad 0 0 0 0 0 0 0 0 2 2
tax 0 0 0 0 0 0 0 0 0 0
ptratio 0 0 0 0 0 0 0 0 0 0
black 0 0 0 0 0 0 0 0 0 0
lstat 0 2 3 4 5 6 7 8 9 10
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.