clydeExport: Export to Clyde Analytical Platform

Description Usage Arguments Details Examples

View source: R/clyde_export.R

Description

Export the calculations in an R script to an .rda file that can be imported into the Clyde Analytical Platform. This allows models fitted in R to be used as derived columns or packaged into RESTful web services for real-time and batch scoring.

Usage

1
clydeExport(exportFileName, predFuncName, predColumnList, libraryList = NULL)

Arguments

exportFileName

The file name used to export, with .rda extension.

predFuncName

The name of the user function making the prediction, as a string (see 'Details').

predColumnList

The list of names returned by the prediction function, as a character vector.

libraryList

The list of libraries needed by the prediction function, as a character vector. If NULL, only the packages attached by default can be used inside the prediction function.

Details

The script needs to contain a user-defined function that takes an explicit list of formal arguments and returns a dataframe. The predFuncName stores the name of this function as a string (see 'Examples'). The predColumnList stores the list of columns calculated by the predFuncName, and can be either a character vector or a list. If this argument is a character vector, the returned type of all the computed columns is assumed to be 'numeric'. Otherwise, this argument should be a named list specifying the return types of all computed columns (one of either 'integer', 'numeric' or 'factor', see the Random Forest example). The libraryList argument should be set to the list of packages needed for the predFuncName, if these packages are not attached by default.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# GLM
data <- data.frame(
   x1 = c(5,10,15,20,30,40,60,80,100),
   x2 = c(118,58,42,35,27,25,21,19,18),
   y = c(69,35,26,21,18,16,13,12,12))

g1 <- glm(y ~ x1 + x2, data=data)

# function storing the calculations, takes as arguments the predictors used in the glm model
# returns a dataframe with one column named 'y_pred'
glmPredict <- function(x1, x2) {
   df <- as.data.frame(cbind(x1, x2))
   res <- as.data.frame(predict(g1, newdata=df, type="response"))
   # set the column name(s) for the returned data frame
   names(res) <- "y_pred"
   return(res)
}

# the argument predFuncName points to the user-defined function storing the calculations
# the argument predColumnList is set to the columns names of the function result
# no need to specify the libraryList argument, since the glm prediction uses only the base and stats
# packages
clydeExport("glm.rda", "glmPredict", c("y_pred"))


# Random Forest
library(randomForest)
data(iris)
# replace dot(.) in names(data) with underscore(_)
names(iris) <- gsub("\\.", "_", names(iris))

set.seed(71)
rf <- randomForest(Species ~ ., data=iris)

# this function takes as arguments the predictors used in the RF model
# returns a data frame with one column named 'Species_pred'
rfPredict <- function(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width) {
   df <- as.data.frame(cbind(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width));
   res <- as.data.frame(predict(rf, newdata=df))
   names(res) <- "Species_pred"
   return(res)
}

# the predColumnList is a named list, specifying that the returned value is a factor
# package 'randomForest' is needed in the 'rfPredict' function, so set the libraryList argument
clydeExport("rf.rda", "rfPredict", list(Species_pred = "factor"), libraryList = c("randomForest"))

# use multiple models, return multiple columns in prediction function
library(rpart)
data(iris)
names(iris) <- gsub("\\.", "_", names(iris))

g2 <- glm(I(Species == "virginica") ~ ., data=iris, family=binomial(logit));
t1 <- rpart(Species ~ ., data=iris)

allPredict <- function(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width) {
   df <- as.data.frame(cbind(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width));
   res <- as.data.frame(predict(g2, type="response"))
   names(res) <- c("viginica_pred")
   res$Species_pred <- predict(t1, newdata=df, type="class")
   res$agree <- as.integer((res$viginica_pred > 0.5) == (res$Species_pred == "virginica"))
   return(res)
}

clydeExport("all.rda", "allPredict",
   list(viginica_pred = "numeric", Species_pred = "factor", agree = "integer"),
   c("rpart"))

eliademicu/clyde documentation built on Sept. 3, 2020, 12:02 a.m.