# combMeanCoef: Mean Coefficient Recombination In datadr: Divide and Recombine for Large, Complex Data

## Description

Mean coefficient recombination – Calculate the weighted average of parameter estimates for a model fit to each subset

## Usage

 `1` ```combMeanCoef(...) ```

## Arguments

 `...` additional attributes to define the combiner (currently only used internally)

## Details

`combMeanCoef` is passed to the argument `combine` in `recombine`

This method is designed to calculate the mean of each model coefficient, where the same model has been fit to subsets via a transformation. The mean is a weighted average of each coefficient, where the weights are the number of observations in each subset. In particular, `drLM` and `drGLM` functions should be used to add the transformation to the ddo that will be recombined using `combMeanCoef`.

## Author(s)

Ryan Hafen

`divide`, `recombine`, `rrDiv`, `combCollect`, `combDdo`, `combDdf`, `combRbind`, `combMean`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32``` ```# Create an irregular number of observations for each species indexes <- sort(c(sample(1:50, 40), sample(51:100, 37), sample(101:150, 46))) irisIrr <- iris[indexes,] # Create a distributed data frame using the irregular iris data set bySpecies <- divide(irisIrr, by = "Species") # Fit a linear model of Sepal.Length vs. Sepal.Width for each species # using 'drLM()' (or we could have used 'drGLM()' for a generlized linear model) lmTrans <- function(x) drLM(Sepal.Length ~ Sepal.Width, data = x) bySpeciesFit <- addTransform(bySpecies, lmTrans) # Average the coefficients from the linear model fits of each species, weighted # by the number of observations in each species out1 <- recombine(bySpeciesFit, combine = combMeanCoef) out1 # A more concise (and readable) way to do it bySpecies %>% addTransform(lmTrans) %>% recombine(combMeanCoef) # The following illustrates an equivalent, but more tedious approach lmTrans2 <- function(x) t(c(coef(lm(Sepal.Length ~ Sepal.Width, data = x)), n = nrow(x))) res <- recombine(addTransform(bySpecies, lmTrans2), combine = combRbind) colnames(res) <- c("Species", "Intercept", "Sepal.Width", "n") res out2 <- c("(Intercept)" = with(res, sum(Intercept * n) / sum(n)), "Sepal.Width" = with(res, sum(Sepal.Width * n) / sum(n))) # These are the same identical(out1, out2) ```