dglm: Distributed Generalized Linear Models
In glm.ddR: Distributed 'glm' for Big Data using 'ddR' API

Description Usage Arguments Details Value Author(s) Examples

View source: R/dglm.R

dglm function is intended to be a distributed alternative for glm function.

dglm(responses, predictors, family=gaussian, weights=NULL, 
na_action="exclude", start=NULL, etastart=NULL, mustart=NULL,
offset=NULL, control=list(...), method="dglm.fit.Newton",
completeModel=FALSE, ...)

`responses`	the darray that contains the vector of responses.
`predictors`	the darray that contains the vector of predictors. dglm() cannot accept a predictor with constant value. Moreover, a categorical predictor should be decoded (converted to several predictors) before applying dglm().
`family`	it specifies the family function for regression. The supported family-links at the time of this writing are gaussian(identity), binomial(logit), and poisson(log). The mentioned links are the default ones for their families; so, specifying them is optional. The default family is Gaussian.
`weights`	it is an optional darray of 'prior weights' to be used in the fitting process. It has a single column. The number of rows and its number of blocks should be the same as responses. The values should not be negative (greater than or equal to zero). Weight zero on a sample makes it be ignored.
`na_action`	it indicates what should happen when the data contain missed values. Values of NA, NaN, and Inf in samples are treated as missed values. There are two options for this argument exclude and fail. When exclude is selected (the default choice), the weight of any sample with missed values will become zero, and that sample will be ignored in the fitting process. In the darray which will be created for residuals, the value corresponding to these samples will be NA. When fail is selected, the function will stop in the case of any missed value in the dataset.
`start`	starting values for coefficients. It is optional.
`etastart`	starting values for parameter 'eta' which is used for computing deviance. It should be of type darray. It is optional.
`mustart`	starting values for mu 'parameter' which is used for computing deviance. It should be of type darray. It is optional.
`offset`	an optional darray which can be used to specify an _a priori_ known component to be included in the linear predictor during fitting.
`control`	an optional list of controlling arguments. The optional elements of the list and their default values are: epsilon = 1e-8, maxit = 25, trace = FALSE, rigorous = FALSE.
`method`	this argument reserved for the future improvement. The only available fitting method at the moment is "dglm.fit.Newton". In the future, if we have new developed algorithms, this argument can be used to switch between them.
`completeModel`	when it is FALSE (default), calculation of several output values that are not required for prediction are skipped. Therefore, the function can perform faster.
`...`	arguments to be used to form the default `control` argument if it is not supplied directly.

predictors and responses must align with each other (have the same number of rows and similar partitioning). Models created either in complete or incomplete mode can be used for prediction. The only motivation behind completeModel=FALSE is performance. Indeed, caluculation of several values, which are not required for prediction, are skipped.

`coefficients`	calculated coefficients
`d.residuals`	(available only when completeModel=TRUE; otherwise it is NULL) the working residuals, that is the residuals in the final iteration of the IWLS fit. Since cases with zero weights are omitted, their working residuals are NA. It is of type darray.
`d.fitted.values`	the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function. It is of type darray.
`family`	the family function used for regression
`d.linear.predictors`	the linear fit on link scale. It is of type darray.
`deviance`	up to a constant, minus twice the maximized log-likelihood.
`aic`	(available only when completeModel=TRUE; otherwise it is NA) a version of Akaike's An Information Criterion, minus twice the maximized log-likelihood plus twice the number of parameters, computed by the aic component of the family.
`null.deviance`	(available only when completeModel=TRUE; otherwise it is NA) the deviance for the null model, comparable with deviance.
`iter`	the number of iterations of IWLS used.
`prior.weights`	the weights initially supplied. All of its values are 1 if no initial weights used. It is of type darray. The value of weight will become 0 for the samples with invalid data (NA, NaN, Inf).
`weights`	the working weights, that is the weights in the final iteration of the IWLS fit. It is of type darray. In order to save memory and execution time, no new darray will be created for weights when the initial weights are all 0 or 1, and it will simply be a reference to prior.weights.
`df.residual`	the residual degrees of freedom.
`df.null`	the residual degrees of freedom for the null model.
`converged`	logical. Was the IWLS algorithm judged to have converged?
`boundary`	logical. Is the fitted value on the boundary of the attainable values?
`responses`	the darray of responses.
`predictors`	the darray of predictors.
`na_action`	this item exists only when a few samples are excluded because of missed data. It is a list containing type "exclude" and the number of excluded samples.
`call`	the matched call.
`offset`	the offset darray used.
`control`	the value of the control argument used.
`method`	the name of the fitter function used, currently always "dglm.fit.Newton".

Vishrut Gupta, Arash Fard

 ## Not run: 
    ## Example for linear regression
    library(glm.ddR)

    require(MASS)
    # creating the darray of response
    Y <- as.darray(data.matrix(Boston["medv"]))
    # creating the darray of predictors
    X <- as.darray(data.matrix(Boston[c("rad","crim","ptratio","dis")]))
    # building linear regression model
    reg <- dglm(Y,X, completeModel=TRUE)
    summary(reg)

    ## Example for logistic regression
    Y <- as.darray(data.matrix(mtcars["am"]))
    X <- as.darray(data.matrix(mtcars[c("wt","hp")]))

    # building logistic regression model
    myModel <- dglm(Y, X, binomial, completeModel=TRUE)
    summary(myModel)
    
    ## Example for poisson regression
    Y <- as.darray(data.matrix(mtcars["carb"]))
    X <- as.darray(data.matrix(mtcars[-which(colnames(mtcars)=="carb")]))
    # building linear regression model
    reg <- dglm(Y,X, poisson, completeModel=TRUE)
    summary(reg)
 
## End(Not run)