buildFromDF: Build Predictive Models from DataFrames

Description Usage Arguments Value Examples

Description

Build a new code prediction model from data stored in dataframes. The function takes two mandatory 2-column dataframes: 'df.text' and 'df.code'. See below for details.

Usage

1
2
buildFromDF(df.text, df.code, modelType = "SVM", multiLabMod = "!d",
  minCodSiz = 0)

Arguments

df.text

a data frame with 2 columns: 'ID' (a unique line identifier), and 'TEXT (free text).

df.code

a data frame with 2 columns: 'ID' (a unique line identifier), and 'DIAG' (alpha-num codes).

modelType

model type from {"SVM", "NB"} for Support Vector Machine, and Naive Bayes respectively.

multiLabMod

a string argument to specify how to deal with multilabeled texts. Put "d" if you want to duplicate them (the same text is considered multiple times with one code at each), or any other value to ignore multilabeled texts.

minCodSiz

minimum code size to be reached (numeric). If this argument is set to K, then the codes with less than K texts will be bootstrapped to reach this minimum.

Value

an object of type "svm" or "nb" (depends on the specified 'modelType' argument.). If you want to use the built model within 'mlcodage' Web service, you must add it to the 'data' folder of the package in the form of an RData object with extension ".model.RData', then re-compile the package.

Examples

1
2
3
df.text = data.frame(ID=c("T1", "T2"), TEXT=c("text numer one", "text number two"))
df.code = data.frame(ID=c("T1", "T2"), DIAG=c("CODE1", "CODE2"))
buildFromCsv(df.text, df.code, modelType="SVM", multiLabMod="!d", minCodSiz=0)

IM-APHP/mlcodage documentation built on May 8, 2019, 10:52 a.m.