ds.impute_bkp: Linear Regression
In stefvanbuuren/dsMiceClient: Distributed Multiple Imputations

Description Usage Arguments Details Value Author(s)

Computes linear models. It can be used to fit univariate, multivariate and weighted linear models. It also can be used to compute single stratum analysis of variance and analysis of covariance.

1	ds.impute_bkp(coef.table = NULL, coef.frm = NULL, datasources = NULL)

`datasources`	a list of opal object(s) obtained after login in to opal servers; these objects hold also the data assign to R, as `data frame`, from opal datasources.
`formula`	a character that can be coerced to an object of class `formula`. It is a symbolic description of the model to be fitted. The details about the model specification are given under 'Details' section.

Models for ds.linear are specified symbolically. A typical model has the form response "~" terms where response is a numeric vector and the terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed. A specification of the form first:second indicates the set of terms obtained by taking the interactions of all terms in first with all terms in second. The specification first*second indicates the cross of first and second.

Non-NULL weights can be used to indicate that each independent variable have different variances.

In the case of distributed univariate and multivariate linear regression, the coefficients are computed by least squares using the the method of matrices. Mathematically, the method of matrices has the same approach than the other methods, but the data is transformed into matrices A and g.

According to \insertCitewalpole1993probabilitydistStatsS, the method of matrices consists in build the matrix A, the matrix g and calculate the coefficients by the equation b={A}^{-1}g. In distributed environments without data sharing the solution is compute the matrices A and g for each data node, and return these results to the central node. The central node combines $A$ and $g$, and compute the regression coefficients by the equation b={A}^{-1}g.

ds.linear calls the server side function matrixMethod2DS, to compute the matrices A and g.

Returns a list with the following components:

`call`	the model formula.
`coefficients`	a vector of linear regression coefficients.
`n.rows`	numerical, the sample size.
`sum.y`	numerical, the sum of elements for a given dependent variable y.
`sum.xtx`	matrix, the combined A matrix.