predict: Generate the 'db.Rquery' object that can calculate the...

Description Usage Arguments Value Author(s) See Also Examples

Description

Generate the db.Rquery object that can calculate the predictions for linear/logistic regressions. The actual result can be viewed using lk.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
## S3 method for class 'lm.madlib'
predict(object, newdata, ...)

## S3 method for class 'lm.madlib.grps'
predict(object, newdata, ...)

## S3 method for class 'logregr.madlib'
predict(object, newdata, type = c("response",
                                  "prob"), ...)

## S3 method for class 'logregr.madlib.grps'
predict(object, newdata, type
= c("response", "prob"), ...)

## S3 method for class 'glm.madlib'
predict(object, newdata, type = c("response",
                                  "prob"), ...)

## S3 method for class 'glm.madlib.grps'
predict(object, newdata, type = c("response",
                                  "prob"), ...)

Arguments

object

The result of madlib.lm and madlib.glm.

newdata

A db.obj object, which contains the information about the real data in the database.

type

A string, default is "response". It produces the predicted results for the newdata. The alternative value is "prob", which is only used for binomial{logit} to compute the probabilities.

A string, default is "response", which produces the TRUE or FALSE prediction. If it is "prob", this function computes the probabilities for TRUE cases.

...

Extra parameters. Not implemented yet.

Value

A db.Rquery object, which contains the SQL query to compute the predictions.

Author(s)

Author: Predictive Analytics Team at Pivotal Inc.

Maintainer: Frank McQuillan, Pivotal Inc. fmcquillan@pivotal.io

See Also

madlib.lm linear regression

madlib.glm logistic regression

lk view the actual result

groups.lm.madlib, groups.lm.madlib.grps, groups.logregr.madlib, groups.logregr.madlib.grps extract grouping column information from the fitted model(s).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
## Not run: 



## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname, verbose = FALSE)

## create db.table object pointing to a data table
delete("abalone", conn.id = cid)
x <- as.db.data.frame(abalone, "abalone", conn.id = cid, verbose = FALSE)

## Example 1 --------

fit <- madlib.lm(rings ~ . - sex - id, data = x)

fit

pred <- predict(fit, x) # prediction

content(pred)

ans <- x$rings # the actual value

lk((ans - pred)^2, 10) # squared error

lk(mean((ans - pred)^2)) # mean squared error

## Example 2 ---------

y <- x
y$sex <- as.factor(y$sex)
fit <- madlib.lm(rings ~ . - id, data = y)

lk(mean((y$rings - predict(fit, y))^2))

## Example 3 ---------

fit <- madlib.lm(rings ~ . - id | sex, data = x)

fit

pred <- predict(fit, x)

content(pred)

ans <- x$rings

lk(mean((ans - pred)^2))

## predictions for one group of data where sex = I
idx <- which(groups(fit)[["sex"]] == "I") # which sub-model
pred1 <- predict(fit[[idx]], x[x$sex == "I",]) # predict on part of data

## Example 3 --------

## plot the predicted values v.s. the true values
ap <- ans # true values
ap$pred <- pred # add a column which is the predicted values

## If the data set is very big, you do not want to load all the
## data points into R and plot. We can just plot a random sample.
random.sample <- lk(sort(ap, FALSE, NULL), 1000) # sort randomly

plot(random.sample)

## ------------------------------------------------------------
## GLM prediction

fit <- madlib.glm(rings ~ . - id | sex, data = x, family = poisson(log),
                  control = list(max.iter = 20))

p <- predict(f)

lk(p, 10)

db.disconnect(cid, verbose = FALSE)

## End(Not run)

PivotalR documentation built on March 13, 2021, 1:06 a.m.