anova_moulton: anova_moulton

Description Usage Arguments Value Author(s) References Examples

View source: R/anova_moulton.R

Description

Anova for analysis including clustered variables. Corrects the F values by taking into account intraclass correlation for known grouping by the Moulton formula (Mostly Harmless Econometrics: An Empiricist's Compagnion, Joshua D. Angrist and Joern-Steffen Pischke), and corrects the degrees of freedom by the between-within method (Singer JD. Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. Journal of Educational and Behavioral Statistics. 1998;24:323-355).

Usage

1
anova_moulton(linmod,data_object,grouping_col,use_predictors=TRUE,method="unbiased")

Arguments

linmod

Standard linear model based on lm. Do not include the clustering variable in the model

data_object

The underlying data that is also used in linmod

grouping_col

The name of the column in data_object that indicates the clustering. Thus, data_object[,"grouping_col"] should be a factor indicating which observation belongs to which cluster.

use_predictors

Switch indicating whether the linear predictors in linmod should also be used in the calculation of the Moulton factor. In general, they should, use_predictors should be use for debugging purposes only

method

Method argument for moulton_factor estimation, ultimately passed on to ICC

Value

An object of class ANOVA, including F and P values adjusted for the clustering by means of the Moulton method and between-within adjustment of degrees of freedom. The moulton factors themselves are returned as attr(x,"moulton"), whereas the degrees of freedom are returned as attributes attr(x,"df2"), assuming x to be the output variable.

Author(s)

Marina + Thomas Braschler

References

"Mostly harmless econometrics: An Empiricist's Companion", Angrist J.D., Pischke, J.S., 2008

"Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models." Singer JD. Journal of Educational and Behavioral Statistics. 1998;24:323-355

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Typically example of clustering. If one looks only at the linear regression between the outcome and regressor columns (data_object below), there is a clear trend and also a relatively
# large number of datapoints. As a consequence, simple linear regression gives a significant result
data_object = data.frame(outcome=c(1,2,5,6,12,5,2,3,20,19,10,15,13,20,14),regressor=c(1,1,1,3,3,3,7,7,7,7,10,10,10,10,10),clusters=c("A","A","B","A","B","B","B","B","C","C","C","C","C","C","C"))
linmod=lm(outcome~regressor,data_object)
summary(linmod)
# However, knowlege of the cluster column (last column in the data_object variable, labelled "clusters") indicates that this might also be because the regressor values 
# tend to be associated with the clusters "A"-"C"
# Intraclass correlation analysis indeed shows that there is a tendency for clustering of the outcome data along the cluster variable
ICC(data_object$outcome,data_object$clusters)
# and also clustering of the regressor values within the clusters
ICC(data_object$regressor,data_object$clusters)

# This however means that outcome values are partial replicates per cluster, which in addition line up unfortunately with the regressors because of the very imbalanced experimental design. 
# And indeed, Moulton analysis shows that the result
# should probably not be considered significant
anova_moulton(linmod,data_object,"clusters")
# One gets to a similar conclusion by aggregating both the outcome and the regressors within the clusters: The trend seems to be there, but the effective n is too low
aggregated_data_object=aggregate(data_object[,c("outcome","regressor")],by=list(clusters=data_object$clusters),FUN=mean)
summary(lm(outcome~regressor,aggregated_data_object))

tbgitoo/moultonTools documentation built on Nov. 8, 2021, 6:15 p.m.