computeLm: Fit Linear Model and return its coefficients.

Description Usage Arguments Details Value Examples

Description

Outputs coefficients of the linear model fitted to Aster table according to the formula expression containing column names. The zeroth coefficient corresponds to the slope intercept. R formula expression with column names for response and predictor variables is exactly as in lm function (though less features supported).

Usage

1
2
computeLm(channel, tableName, formula, tableInfo = NULL, categories = NULL,
  sampleSize = 1000, where = NULL, test = FALSE)

Arguments

channel

connection object as returned by odbcConnect

tableName

Aster table name

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under 'Details'.

tableInfo

pre-built table summary with data types

categories

vector with column names containing categorical data. Optional if the column is of character type as it is automatically treated as categorical predictors. But if numerical column contains categorical data then then it has to be specified for a model to view it as categorical. Apply extra care not to have columns with too many values (approximaltely > 10) as categorical because each value results in dummy predictor variable added to the model.

sampleSize

function always computes regression model coefficent on all data in the table. But it computes predictions and returns an object of class "lm" based on sample of data. The sample size is in an absolute value for number of rows in the sample. Be careful not overestimating the size as all results are loaded into memory. Special value "all" or "ALL" will include all data in computation.

where

specifies criteria to satisfy by the table rows before applying computation. The creteria are expressed in the form of SQL predicates (inside WHERE clause).

test

logical: if TRUE show what would be done, only (similar to parameter test in RODBC functions like sqlQuery and sqlSave).

Details

Models for computeLm are specified symbolically. A typical model has the form response ~ terms where response is the (numeric) column and terms is a series of column terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed. A specification of the form first:second and first*second (interactions) are not supported yet.

Value

computeLm returns an object of class "toalm", "lm".

The function summary .....

For backward compatibility Outputs data frame containing 3 columns:

coefficient_name

name of predictor table column, zeroth coefficient name is "0"

coefficient_index

index of predictor table column starting with 0

value

coefficient value

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")

# batting average explained by rbi, bb, so 
lm1 = computeLm(channel=conn, tableName="batting_enh", formula= ba ~ rbi + bb + so)
summary(lm1)

# with category predictor league and explicit sample size
lm2 = computeLm(channel=conn, tableName="batting_enh", formula= ba ~ rbi + bb + so + lgid,
                , sampleSize=10000, where="lgid in ('AL','NL') and ab > 30") 
summary(lm2)
}

teradata-aster-field/toaster documentation built on May 31, 2019, 8:36 a.m.