madlib.rpart: MADlib wrapper function for Decision Tree
In PivotalR: A Fast, Easy-to-Use Tool for Manipulating Tables in Databases and a Wrapper of MADlib

Description Usage Arguments Value Author(s) References See Also Examples

This function is a wrapper of MADlib's decision tree model training function. The resulting tree is stored in a table in the database, and one can also view the result from R using plot.dt.madlib, text.dt.madlib and print.dt.madlib.

1 2	madlib.rpart(formula, data, weights = NULL, id = NULL, na.action = NULL, parms, control, na.as.level = FALSE, verbose = FALSE, ...)

`formula`	A formula object, intercept term will automatically be removed. Factors will not be expanded to their dummy variables. Grouping syntax is also supported, see `madlib.lm` and `madlib.glm` for more details.
`data`	A `db.obj` object, which wraps the data in the database.
`weights`	A string, the column name for the weights.
`id`	A string, the index for each row. If `key` has been specified for `data`, teh key will be used as the ID unless this argument is also specified. We have to have this specified so that `predict.dt.madlib`'s result can be compared with the original data.
`na.action`	A function, which filters the `NULL` values from the data. Not implemented yet.
`parms`	A list, which includes parameters for the splitting function. Supported parameters include: 'split' specifying which split function to use. Options are 'gini', 'misclssification' and 'entropy' for classification, and 'mse' for regression. Default is 'gini' for classification and 'mse' for regression.
`control`	A list, which includes parameters for the fit. Supported parameters include: 'minsplit' - minimum number of observations that must be present in a node for a split to be attempted. default is minsplit=20 'minbucket' - Minimum number of observations in any terminal node, default is min_split/3 'maxdepth' - Maximum depth of any node, default is maxdepth=10 'nbins' - Number of bins to find possible node split threshold values for continuous variables, default is 100 (Must be greater than 1) 'cp' - Cost complexity parameter, default is cp=0.01 'n_folds' - Number of cross-validation folds 'max_surrogates' - The number of surrogates number
`na.as.level`	A boolean, indicating if NULL value for a categorical variable is treated as a distinct level, default is na.as.level=false
`verbose`	A boolean, indicating whether or not to print more info, default is verbose=false
`...`	Arguments to be passed to or from other methods.

An S3 object of type dt.madlib in the case of non-grouping, and of type dt.madlib.grp in the case of grouping.

Author: Predictive Analytics Team at Pivotal Inc.

Maintainer: Frank McQuillan, Pivotal Inc. fmcquillan@pivotal.io

[1] Documentation of decision tree in MADlib 1.6, https://madlib.apache.org/docs/latest/

plot.dt.madlib, text.dt.madlib, print.dt.madlib are visualization functions for a model fitted through madlib.rpart

predict.dt.madlib is a wrapper for MADlib's predict function for decision trees.

madlib.lm, madlib.glm, madlib.summary, madlib.arima, madlib.elnet are all MADlib wrapper functions.

## Not run: 


## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname, verbose = FALSE)

x <- as.db.data.frame(abalone, conn.id = cid, verbose = FALSE)
lk(x, 10)

## decision tree using abalone data, using default values of minsplit,
## maxdepth etc.
key(x) <- "id"
fit <- madlib.rpart(rings < 10 ~ length + diameter + height + whole + shell,
       data=x, parms = list(split='gini'), control = list(cp=0.005))
fit

## Another example, using grouping
fit <- madlib.rpart(rings < 10 ~ length + diameter + height + whole + shell | sex,
       data=x, parms = list(split='gini'), control = list(cp=0.005))
fit

db.disconnect(cid)

## End(Not run)

PivotalR documentation built on March 13, 2021, 1:06 a.m.

PivotalR index

README.md An Introduction to PivotalR

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

PivotalR
A Fast, Easy-to-Use Tool for Manipulating Tables in Databases and a Wrapper of MADlib

madlib.rpart: MADlib wrapper function for Decision Tree
In PivotalR: A Fast, Easy-to-Use Tool for Manipulating Tables in Databases and a Wrapper of MADlib

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Related to madlib.rpart in PivotalR...

R Package Documentation

Browse R Packages

We want your feedback!

PivotalR A Fast, Easy-to-Use Tool for Manipulating Tables in Databases and a Wrapper of MADlib

madlib.rpart: MADlib wrapper function for Decision Tree In PivotalR: A Fast, Easy-to-Use Tool for Manipulating Tables in Databases and a Wrapper of MADlib

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Related to madlib.rpart in PivotalR...

R Package Documentation

Browse R Packages

We want your feedback!

PivotalR
A Fast, Easy-to-Use Tool for Manipulating Tables in Databases and a Wrapper of MADlib

madlib.rpart: MADlib wrapper function for Decision Tree
In PivotalR: A Fast, Easy-to-Use Tool for Manipulating Tables in Databases and a Wrapper of MADlib