dglm: Distributed Generalized Linear Models

Description Usage Arguments Details Value Author(s) Examples

View source: R/dglm.R

Description

dglm function is intended to be a distributed alternative for glm function.

Usage

1
2
3
4
dglm(responses, predictors, family=gaussian, weights=NULL, 
na_action="exclude", start=NULL, etastart=NULL, mustart=NULL,
offset=NULL, control=list(...), method="dglm.fit.Newton",
completeModel=FALSE, ...)

Arguments

responses

the darray that contains the vector of responses.

predictors

the darray that contains the vector of predictors. dglm() cannot accept a predictor with constant value. Moreover, a categorical predictor should be decoded (converted to several predictors) before applying dglm().

family

it specifies the family function for regression. The supported family-links at the time of this writing are gaussian(identity), binomial(logit), and poisson(log). The mentioned links are the default ones for their families; so, specifying them is optional. The default family is Gaussian.

weights

it is an optional darray of 'prior weights' to be used in the fitting process. It has a single column. The number of rows and its number of blocks should be the same as responses. The values should not be negative (greater than or equal to zero). Weight zero on a sample makes it be ignored.

na_action

it indicates what should happen when the data contain missed values. Values of NA, NaN, and Inf in samples are treated as missed values. There are two options for this argument exclude and fail. When exclude is selected (the default choice), the weight of any sample with missed values will become zero, and that sample will be ignored in the fitting process. In the darray which will be created for residuals, the value corresponding to these samples will be NA. When fail is selected, the function will stop in the case of any missed value in the dataset.

start

starting values for coefficients. It is optional.

etastart

starting values for parameter 'eta' which is used for computing deviance. It should be of type darray. It is optional.

mustart

starting values for mu 'parameter' which is used for computing deviance. It should be of type darray. It is optional.

offset

an optional darray which can be used to specify an _a priori_ known component to be included in the linear predictor during fitting.

control

an optional list of controlling arguments. The optional elements of the list and their default values are: epsilon = 1e-8, maxit = 25, trace = FALSE, rigorous = FALSE.

method

this argument reserved for the future improvement. The only available fitting method at the moment is "dglm.fit.Newton". In the future, if we have new developed algorithms, this argument can be used to switch between them.

completeModel

when it is FALSE (default), calculation of several output values that are not required for prediction are skipped. Therefore, the function can perform faster.

...

arguments to be used to form the default control argument if it is not supplied directly.

Details

predictors and responses must align with each other (have the same number of rows and similar partitioning). Models created either in complete or incomplete mode can be used for prediction. The only motivation behind completeModel=FALSE is performance. Indeed, caluculation of several values, which are not required for prediction, are skipped.

Value

coefficients

calculated coefficients

d.residuals

(available only when completeModel=TRUE; otherwise it is NULL) the working residuals, that is the residuals in the final iteration of the IWLS fit. Since cases with zero weights are omitted, their working residuals are NA. It is of type darray.

d.fitted.values

the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function. It is of type darray.

family

the family function used for regression

d.linear.predictors

the linear fit on link scale. It is of type darray.

deviance

up to a constant, minus twice the maximized log-likelihood.

aic

(available only when completeModel=TRUE; otherwise it is NA) a version of Akaike's An Information Criterion, minus twice the maximized log-likelihood plus twice the number of parameters, computed by the aic component of the family.

null.deviance

(available only when completeModel=TRUE; otherwise it is NA) the deviance for the null model, comparable with deviance.

iter

the number of iterations of IWLS used.

prior.weights

the weights initially supplied. All of its values are 1 if no initial weights used. It is of type darray. The value of weight will become 0 for the samples with invalid data (NA, NaN, Inf).

weights

the working weights, that is the weights in the final iteration of the IWLS fit. It is of type darray. In order to save memory and execution time, no new darray will be created for weights when the initial weights are all 0 or 1, and it will simply be a reference to prior.weights.

df.residual

the residual degrees of freedom.

df.null

the residual degrees of freedom for the null model.

converged

logical. Was the IWLS algorithm judged to have converged?

boundary

logical. Is the fitted value on the boundary of the attainable values?

responses

the darray of responses.

predictors

the darray of predictors.

na_action

this item exists only when a few samples are excluded because of missed data. It is a list containing type "exclude" and the number of excluded samples.

call

the matched call.

offset

the offset darray used.

control

the value of the control argument used.

method

the name of the fitter function used, currently always "dglm.fit.Newton".

Author(s)

Vishrut Gupta, Arash Fard

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
 ## Not run: 
    ## Example for linear regression
    library(glm.ddR)

    require(MASS)
    # creating the darray of response
    Y <- as.darray(data.matrix(Boston["medv"]))
    # creating the darray of predictors
    X <- as.darray(data.matrix(Boston[c("rad","crim","ptratio","dis")]))
    # building linear regression model
    reg <- dglm(Y,X, completeModel=TRUE)
    summary(reg)

    ## Example for logistic regression
    Y <- as.darray(data.matrix(mtcars["am"]))
    X <- as.darray(data.matrix(mtcars[c("wt","hp")]))

    # building logistic regression model
    myModel <- dglm(Y, X, binomial, completeModel=TRUE)
    summary(myModel)
    
    ## Example for poisson regression
    Y <- as.darray(data.matrix(mtcars["carb"]))
    X <- as.darray(data.matrix(mtcars[-which(colnames(mtcars)=="carb")]))
    # building linear regression model
    reg <- dglm(Y,X, poisson, completeModel=TRUE)
    summary(reg)
 
## End(Not run)    

glm.ddR documentation built on May 29, 2017, 6:49 p.m.