ds.linear: Linear Regression

Description Usage Arguments Details Value Dependencies Author(s) References See Also Examples

Description

Computes linear models. It can be used to fit univariate, multivariate and weighted linear models. It also can be used to compute single stratum analysis of variance and analysis of covariance.

Usage

1
2
ds.linear(formula, weight = 1, type = "combine", checks = FALSE,
  datasources = NULL)

Arguments

formula

a character that can be coerced to an object of class formula. It is a symbolic description of the model to be fitted. The details about the model specification are given under 'Details' section.

weight

a character, the name of an optional vector of weights to be used in the fitting process. Should be null or a numeric vector. If it is not NULL, the weighted least squares is computed, otherwise ordinary least squares is computed. See also 'Details'.

type

a character which represents the type of analysis to carry out. If type is set to 'combine', a global quantile is calculated; if type is set to 'split', the quantile is calculated separately for each study

checks

a boolean, if TRUE (default) checks that verify elements on the server side such checks lengthen the run-time so the default is FALSE and one can switch these checks on (set to TRUE) when faced with some error(s).

datasources

a list of opal object(s) obtained after login in to opal servers; these objects hold also the data assign to R, as data frame, from opal datasources.

Details

Models for ds.linear are specified symbolically. A typical model has the form response "~" terms where response is a numeric vector and the terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed. A specification of the form first:second indicates the set of terms obtained by taking the interactions of all terms in first with all terms in second. The specification first*second indicates the cross of first and second.

Non-NULL weights can be used to indicate that each independent variable have different variances.

In the case of distributed univariate and multivariate linear regression, the coefficients are computed by least squares using the the method of matrices. Mathematically, the method of matrices has the same approach than the other methods, but the data is transformed into matrices A and g.

According to \insertCitewalpole1993probabilitydistStatsS, the method of matrices consists in build the matrix A, the matrix g and calculate the coefficients by the equation b={A}^{-1}g. In distributed environments without data sharing the solution is compute the matrices A and g for each data node, and return these results to the central node. The central node combines $A$ and $g$, and compute the regression coefficients by the equation b={A}^{-1}g.

ds.linear calls the server side function matrixMethod2DS, to compute the matrices A and g.

Value

Returns a list with the following components:

call

the model formula.

coefficients

a vector of linear regression coefficients.

n.rows

numerical, the sample size.

sum.y

numerical, the sum of elements for a given dependent variable y.

sum.xtx

matrix, the combined A matrix.

Dependencies

matrixMethod2DS

Author(s)

Paula Raissa Costa e Silva

References

\insertRef

walpole1993probabilitydistStatsClient

See Also

Other regressions: ds.logistic, ds.poisson, getDerivative

Other regressions: ds.logistic, ds.poisson, getDerivative

Examples

1
2
3
{
lm <- ds.linear('D$maternal_age~D$birth_weight')
}

paularaissa/distStatsClient documentation built on June 19, 2019, 12:43 a.m.