as.factor-methods: Convert one column of a 'db.obj' object into a categorical...

Description Usage Arguments Value Author(s) See Also Examples

Description

Convert one column of a db.obj object into a categorical variable. When madlib.lm or madlib.glm are applied onto a db.obj with categorical columns, dummy columns will be created and fitted. The reference level for regressions can be selected using relevel.

Usage

1
2
3
4
5
## S4 method for signature 'db.obj'
as.factor(x)

## S4 method for signature 'db.obj'
relevel(x, ref, ...)

Arguments

x

A db.obj object. It must have only one column.

ref

A single value, which is the reference level that is used in the regressions.

...

Other arguments passed into the result. Not implemented yet.

Value

A db.Rquery object. It has only one column which is categorical. By default, a reference level is automatically selected in regressions, which is usually the minimum of all levels, but one can easily change the reference level using relevel.

Author(s)

Author: Predictive Analytics Team at Pivotal Inc.

Maintainer: Frank McQuillan, Pivotal Inc. fmcquillan@pivotal.io

See Also

madlib.lm and madlib.glm can fit categorical variables

When as.db.data.frame creates a table/view, it can create dummy variables for a categorical variable.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
## Not run: 
## get help for a method
## help("as.factor,db.obj-method")



## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname, verbose = FALSE)

## create a temporary table from the example data.frame "abalone"
x <- as.db.data.frame(abalone, conn.id = cid, verbose = FALSE)

## set sex to be a categorical variable
x$sex <- as.factor(x$sex)

fit1 <- madlib.lm(rings ~ . - id, data = x) # linear regression

fit2 <- madlib.glm(rings < 10 ~ . - id, data = x, family = "binomial") # logistic regression

## another temporary table
z <- as.db.data.frame(abalone, conn.id = cid, verbose = FALSE)

## specify factor during fitting
fit3 <- madlib.lm(rings ~ as.factor(sex) + length + diameter, data = z)

## as.factor is automatically used onto text column
## so as.factor is not necessary
fit4 <- madlib.glm(rings < 10 ~ sex + length + diameter, data
= z, family = "binomial")

## using relevel to change the reference level
x$sex <- relevel(x$sex, ref = "M")
madlib.lm(rings ~ . - id, data = x) # use "M" as the reference level

db.disconnect(cid, verbose = FALSE)

## End(Not run)

PivotalR documentation built on March 13, 2021, 1:06 a.m.