Description Usage Arguments Details Value Note Author(s) See Also Examples
This function wraps MADlib's SVM for classification, regresssion and novelty detection.
1 2 3 4 5 6 7 | madlib.svm (formula, data,
na.action = NULL, na.as.level = FALSE,
type = c("classification", "regression", "one-class"),
kernel = c("gaussian", "linear", "polynomial"),
degree = 3, gamma = NULL, coef0 = 1.0, class.weight = NULL,
tolerance = 1e-10, epsilon = NULL, cross = 0, lambda = 0.01,
control = list(), verbose = FALSE, ...)
|
formula |
an object of class " |
data |
An object of |
na.action |
A string which indicates what should happen when the data
contain |
na.as.level |
A logical value, default is |
type |
A string, default: "classification". Indicate type of analysis to perform: "classification", "regression" or "one-class". |
kernel |
A string, default: "gaussian". Type of kernel. Currently three kernel types are supported: 'linear', 'gaussian', and 'polynomial'. |
degree |
Default: 3. The parameter needed for polynomial kernel |
gamma |
Default: 1/num_features. The parameter needed for gaussian kernel |
coef0 |
Default: 1.0. The independent term in polynomial kernel |
class.weight |
Default: 1.0. Set the weight for the positive and negative classes. If not given, all classes are set to have weight one. If class_weight = balanced, values of y are automatically adjusted as inversely proportional to class frequencies in the input data i.e. the weights are set as n_samples / (n_classes * bincount(y)). Alternatively, class_weight can be a mapping, giving the weight for each class. Eg. For dependent variable values 'a' and 'b', the class_weight can be a: 2, b: 3. This would lead to each 'a' tuple's y value multiplied by 2 and each 'b' y value will be multiplied by 3. For regression, the class weights are always one. |
tolerance |
Default: 1e-10. The criterion to end iterations. The training stops whenever <the difference between the training models of two consecutive iterations is <smaller than tolerance or the iteration number is larger than max_iter. |
epsilon |
Default: [0.01]. Determines the epsilon for epsilon-SVR. Ignored during classification. When training the model, differences of less than epsilon between estimated labels and actual labels are ignored. A larger epsilon will yield a model with fewer support vectors, but will not generalize as well to future data. Generally, it has been suggested that epsilon should increase with noisier data, and decrease with the number of samples. See [5]. |
cross |
Default: 0. Number of folds (k). Must be at least 2 to activate cross validation. If a value of k > 2 is specified, each fold is then used as a validation set once, while the other k - 1 folds form the training set. |
lambda |
Default: [0.01]. Regularization parameter. Must be non-negative. |
control |
A list, which contains the more control parameters for the optimizer. - - - - - - |
verbose |
A logical value, default: FALSE. Verbose output of the results of training. |
... |
More parameters can be passed into this function. Currently, it is just a place holder and any parameter here is not used. |
For details about how to write a formula, see formula
for details. "|" can be used at the end of the formula to denote that
the fitting is done conditioned on the values of one or more
variables. For example, y ~ x + sin(z) | v + w
will do the
fitting each distinct combination of the values of v
and
w
.
If there is no grouping (i.e. no |
in the formula), the result
is a svm.madlib
object. Otherwise, it is a svm.madlib.grps
object, which is just a list of svm.madlib
objects.
A svm.madlib
object is a list which contains the following items:
coef |
A vector, the fitting coefficients. |
grps |
An integer, the number of groups that the data is divided into according to the grouping columns in the formula. |
grp.cols |
An array of strings. The column names of the grouping columns. |
has.intercept |
A logical, whether the intercept is included in the fitting. |
ind.vars |
An array of strings, all the different terms used as independent variables in the fitting. |
ind.str |
A string. The independent variables in an array format string. |
call |
A language object. The function call that generates this result. |
col.name |
An array of strings. The column names used in the fitting. |
appear |
An array of strings, the same length as the number of independent
variables. The strings are used to print a clean result, especially when
we are dealing with the factor variables, where the dummy variable
names can be very long due to the inserting of a random string to
avoid naming conflicts, see |
model |
A |
model.summary |
A |
model.random |
A |
terms |
A |
nobs |
The number of observations used to fit the model. |
data |
A |
origin.data |
The original |
Note that if there is grouping done, and there are multiple
svm.madlib
objects in the final result, each one of them
contains the same copy model
.
|
is not part of standard R formula object, but many R packages
use |
to add their own functionalities into formula
object. However, |
has different meanings and usages
in different packages. The user must be careful that usage of |
in
PivotalR-package
may not be the same as the others.
Author: Predictive Analytics Team at Pivotal Inc.
Maintainer: Frank McQuillan, Pivotal Inc. fmcquillan@pivotal.io
madlib.lm
,
madlib.summary
, madlib.arima
are MADlib
wrapper functions.
as.factor
creates categorical variables for fitting.
delete
safely deletes the result of this function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | ## Not run:
## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname, verbose = FALSE)
data <- as.db.data.frame(abalone, conn.id = cid, verbose = FALSE)
lk(data, 10)
## svm regression
## i.e. grouping on multiple columns
fit <- madlib.svm(length ~ height + shell | sex + (rings > 7), data = data, type = "regression")
fit
## use I(.) for expressions
fit <- madlib.svm(rings > 7 ~ height + shell + diameter + I(diameter^2),
data = data, type = "classification")
fit # display the result
## Adding new column for training
dat <- data
dat$arr <- db.array(data[,-c(1,2)])
array.data <- as.db.data.frame(dat)
fit <- madlib.svm(rings > 7 ~ arr, data = array.data)
db.disconnect(cid)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.