This function is a wrapper of MADlib's decision tree model training
function. The resulting tree is stored in a table in the database, and one
can also view the result from R using `plot.dt.madlib`

,
`text.dt.madlib`

and `print.dt.madlib`

.

1 2 |

`formula` |
A formula object, intercept term will automatically be removed. Factors will
not be expanded to their dummy variables. Grouping syntax is also supported,
see |

`data` |
A |

`weights` |
A string, the column name for the weights. |

`id` |
A string, the index for each row. If |

`na.action` |
A function, which filters the |

`parms` |
A list, which includes parameters for the splitting function. Supported parameters include: 'split' specifying which split function to use. Options are 'gini', 'misclssification' and 'entropy' for classification, and 'mse' for regression. Default is 'gini' for classification and 'mse' for regression. |

`control` |
A list, which includes parameters for the fit. Supported parameters include: 'minsplit' - minimum number of observations that must be present in a node for a split to be attempted. default is minsplit=20 'minbucket' - Minimum number of observations in any terminal node, default is min_split/3 'maxdepth' - Maximum depth of any node, default is maxdepth=10 'nbins' - Number of bins to find possible node split threshold values for continuous variables, default is 100 (Must be greater than 1) 'cp' - Cost complexity parameter, default is cp=0.01 'n_folds' - Number of cross-validation folds 'max_surrogates' - The number of surrogates number |

`na.as.level` |
A boolean, indicating if NULL value for a categorical variable is treated as a distinct level, default is na.as.level=false |

`verbose` |
A boolean, indicating whether or not to print more info, default is verbose=false |

`...` |
Arguments to be passed to or from other methods. |

An S3 object of type dt.madlib in the case of non-grouping, and of type dt.madlib.grp in the case of grouping.

Author: Predictive Analytics Team at Pivotal Inc.

Maintainer: Frank McQuillan, Pivotal Inc. [email protected]

[1] Documentation of decision tree in MADlib 1.6, http://doc.madlib.net/latest/

`plot.dt.madlib`

, `text.dt.madlib`

, `print.dt.madlib`

are
visualization functions for a model fitted through madlib.rpart

`predict.dt.madlib`

is a wrapper for MADlib's predict function for
decision trees.

`madlib.lm`

, `madlib.glm`

,
`madlib.summary`

, `madlib.arima`

, `madlib.elnet`

are all MADlib wrapper functions.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | ```
## Not run:
## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname, verbose = FALSE)
x <- as.db.data.frame(abalone, conn.id = cid, verbose = FALSE)
lk(x, 10)
## decision tree using abalone data, using default values of minsplit,
## maxdepth etc.
key(x) <- "id"
fit <- madlib.rpart(rings < 10 ~ length + diameter + height + whole + shell,
data=x, parms = list(split='gini'), control = list(cp=0.005))
fit
## Another example, using grouping
fit <- madlib.rpart(rings < 10 ~ length + diameter + height + whole + shell | sex,
data=x, parms = list(split='gini'), control = list(cp=0.005))
fit
db.disconnect(cid)
## End(Not run)
``` |

