Description Usage Arguments Details Value Author(s) References See Also Examples
This function creates an Oracle Data Mining Naive Bayes model.
1 2 3 4 5 6 7 8 9 10 |
database |
Database ODBC channel identifier returned from a call to RODM_open_dbms_connection |
data_table_name |
Database table/view containing the training dataset. |
case_id_column_name |
Row unique case identifier in data_table_name. |
target_column_name |
Target column name in data_table_name. |
model_name |
ODM Model name. |
auto_data_prep |
Whether or not ODM should invoke automatic data preparation for the build. |
class_priors |
User-specified priors for the target classes. |
retrieve_outputs_to_R |
Flag controlling if the output results are moved to the R environment. |
leave_model_in_dbms |
Flag controlling if the model is deleted or left in RDBMS. |
sql.log.file |
File where to append the log of all the SQL calls made by this function. |
Naive Bayes (NB) for classification makes predictions using Bayes' Theorem assuming that each attribute is conditionally independent of the others given a particular value of the target (Duda, Hart and Stork 2000). NB provides a very flexible general classifier for fast model building and scoring that can be used for both binary and multi-class classification problems.
For more details on the algotithm implementation, parameters settings and characteristics of the ODM function itself consult the following Oracle documents: ODM Concepts, ODM Application Developer's Guide, Oracle SQL Packages: Data Mining, and Oracle Database SQL Language Reference (Data Mining functions), listed in the references below.
If retrieve_outputs_to_R is TRUE, returns a list with the following elements:
model.model_settings |
Table of settings used to build the model. |
model.model_attributes |
Table of attributes used to build the model. |
nb.conditionals |
Table of conditional probabilities. |
Pablo Tamayo pablo.tamayo@oracle.com
Ari Mozes ari.mozes@oracle.com
Oracle Data Mining Concepts 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/toc.htm
Oracle Data Mining Application Developer's Guide 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28131/toc.htm
Oracle Data Mining Administrator's Guide 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28130/toc.htm
Oracle Database PL/SQL Packages and Types Reference 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/d_datmin.htm#ARPLS192
Oracle Database SQL Language Reference (Data Mining functions) 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/functions001.htm#SQLRF20030
RODM_apply_model
,
RODM_drop_model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | # Predicting survival in the sinking of the Titanic based on pasenger's sex, age, class, etc.
## Not run:
DB <- RODM_open_dbms_connection(dsn="orcl11g", uid= "rodm", pwd = "rodm")
data(titanic3, package="PASWR") # Load survival data from Titanic
ds <- titanic3[,c("pclass", "survived", "sex", "age", "fare", "embarked")] # Select subset of attributes
ds[,"survived"] <- ifelse(ds[,"survived"] == 1, "Yes", "No") # Rename target values
n.rows <- length(ds[,1]) # Number of rows
set.seed(seed=6218945)
random_sample <- sample(1:n.rows, ceiling(n.rows/2)) # Split dataset randomly in train/test subsets
titanic_train <- ds[random_sample,] # Training set
titanic_test <- ds[setdiff(1:n.rows, random_sample),] # Test set
RODM_create_dbms_table(DB, "titanic_train") # Push the training table to the database
RODM_create_dbms_table(DB, "titanic_test") # Push the testing table to the database
# If the target distribution does not reflect the actual distribution due
# to specialized sampling, specify priors for the model
priors <- data.frame(
target_value = c("Yes", "No"),
prior_probability = c(0.1, 0.9))
# Create an ODM Naive Bayes model
nb <- RODM_create_nb_model(
database = DB, # Database ODBC channel identifier
model_name = "titanic_nb_model", # ODM model name
data_table_name = "titanic_train", # (in quotes) Data frame or database table containing the input dataset
class_priors = priors, # user-specified priors
target_column_name = "survived") # Target column name in data_table_name
# Predict test data using the Naive Bayes model
nb2 <- RODM_apply_model(
database = DB, # Database ODBC channel identifier
data_table_name = "titanic_test", # Database table containing the input dataset
model_name = "titanic_nb_model", # ODM model name
supplemental_cols = "survived") # Carry the target column to the output for analysis
# Compute contingency matrix, performance statistics and ROC curve
print(nb2$model.apply.results[1:10,]) # Print example of prediction results
actual <- nb2$model.apply.results[, "SURVIVED"]
predicted <- nb2$model.apply.results[, "PREDICTION"]
probs <- as.real(as.character(nb2$model.apply.results[, "'Yes'"]))
table(actual, predicted, dnn = c("Actual", "Predicted")) # Confusion matrix
library(verification)
perf.auc <- roc.area(ifelse(actual == "Yes", 1, 0), probs) # Compute ROC and plot
auc.roc <- signif(perf.auc$A, digits=3)
auc.roc.p <- signif(perf.auc$p.value, digits=3)
roc.plot(ifelse(actual == "Yes", 1, 0), probs, binormal=T, plot="both", xlab="False Positive Rate",
ylab="True Postive Rate", main= "Titanic survival ODM NB model ROC Curve")
text(0.7, 0.4, labels= paste("AUC ROC:", signif(perf.auc$A, digits=3)))
text(0.7, 0.3, labels= paste("p-value:", signif(perf.auc$p.value, digits=3)))
nb # look at the model details
RODM_drop_model(DB, "titanic_nb_model") # Drop the model
RODM_drop_dbms_table(DB, "titanic_train") # Drop the training table in the database
RODM_drop_dbms_table(DB, "titanic_test") # Drop the testing table in the database
RODM_close_dbms_connection(DB)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.