null.data: A Data Set with lots of 'NA' values

Description Usage Format Details Note Examples

Description

An example data.frame which is used by examples in this user manual

Usage

1

Format

This data has 104 columns and 2000 rows.

Details

This data set has lots of NA values in it. By using as.db.data.frame, one can put the data set into the connected database. All the NA values will be converted into NULL values.

The MADlib wrapper functions like madlib.lm and link{madlib.glm} will throw an error if there are NULL values in the data. So one needs to clean up the data before using the regression functions supplied by MADlib.

Note

Lazy data loading is enabled in this package. So the user does not need to explicitly run data(null.data) to load the data. It will be loaded whenever it is used.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
## Not run: 


## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname, verbose = FALSE)

## create a table from the example data.frame "abalone"
delete("null_data", conn.id = cid)
x <- as.db.data.frame(null.data, "null_data", conn.id = cid, verbose = FALSE)

## ERROR, because of NULL values
fit <- madlib.lm(sf_mrtg_pct_assets ~ ris_asset + lncrcd + lnauto +
                 lnconoth + lnconrp + intmsrfv + lnrenr1a + lnrenr2a +
                 lnrenr3a, data = x)

## select columns
y <- x[,c("sf_mrtg_pct_assets","ris_asset", "lncrcd","lnauto",
          "lnconoth","lnconrp","intmsrfv","lnrenr1a","lnrenr2a",
          "lnrenr3a")]

dim(y)

## remove NULL values
for (i in 1:10) y <- y[!is.na(y[i]),]

dim(y)

fit <- madlib.lm(sf_mrtg_pct_assets ~ ., data = y)

fit

db.disconnect(cid, verbose = FALSE)

## End(Not run)

pivotalsoftware/PivotalR documentation built on March 18, 2021, 9:37 a.m.