knitr::opts_chunk$set(echo = TRUE, message = F)

WGA R Package & Tutorial

Our package builds on the multilevel package developed by Paul Bliese. However, our package differs from the multilevel in two important ways. First, our functions require a bit more information in order to compute the agreement statistics. Second, the burden of providing additional information is offset, we believe, with the generation of more comprehensive output, designed to be easily integrated into descriptions of data aggregation.

By generating a more comprehensive set of results, we hope researchers will use these results to enhance the clarity, transparency, and reproducibility of future multilevel research. Below we provide a brief tutorial on the use of our package. At this point, our package is hosted on GitHub, but we plan to make it available via CRAN, too.

Basic Functions Included with WGA: RWG, RWGJ, AWG, & AD

To illustrate the different functions, we walk the reader through a simple tutorial. The following code can be pasted into R or RStudio. Lines 1 and 2 install and load the "devtools" package into the library of active resources. Line 3 installs the "WGA" package.

  1. install.packages("devtools", repos = "http://cran.us.r-project.org")
  2. library(devtools)
  3. devtools::install_github("james-lebreton/WGA", force=T)

Once the "WGA" package is downloaded from github, you may load it into your library of active resources.

Step 1 – Install Packages

library(WGA)

Step 2 – Load Data

To illustrate our functions, we will use the lq2002 data that are included in the multilevel package. This package will be automatically loaded when using the WGA package. lq2002 contains data from 2042 soldiers nested in 49 companies in the U.S. Army. The following code loads the data and shows the first 10 variables for the first 10 soldiers.

data(lq2002, package = 'multilevel')

lq2002[1:10,1:15]

Step 3a - Estimate Agreement on a Single Item using rWG, aWG, AD

Some users may wish to estimate agreement using a specific agreement statistic. Thus, our package includes separate functions for estimating within-unit agreement when a single item is used (i.e., agreement between soldiers nested in a common Army company on a specific item asking about leadership climate).

Each of these functions requires the user to provide information about:
a) x, the item of interest,
b) grpid, the clustering or grouping variable,
c) scale, a vector containing the lowest and highest values of the response scale (e.g., scale = (1,5)),
d) model, a user-supplied description of the multilevel data aggregation measurement model (e.g., “Consensus”),
e) reset, a logical argument that takes values of TRUE or FALSE and indicates whether or not out-of-range values of rWG should be reset to zero, and
f) cutoff, a user-supplied cutoff for justifying data aggregation.

Below we present the code we used to estimate rWG values for the sixth item measuring solider perceptions of leadership (i.e., lq2002$LEAD06) which was measured on a scale ranging from 1 to 5 (i.e., scale = c(1,5). We described our model as a consensus model and indicated that we did not want out-of-range values to be automatically reset to zero. Wet specified a cutoff reflecting a moderate degree of agreement (rWG > 0.50). We will focus our attention on the estimates of rWG computed using a uniform null distribution (reflecting no response biases) and those computed using a slightly skewed null distribution (reflecting a slight leniency bias). The latter allows us to consider that some soldiers may be concerned about making negative statements about their commanding officers. We saved the results as a new object denoted fit1.

fit1<-RWG(x=lq2002$LEAD06, grpid=lq2002$COMPID, scale=c(1,5), model="Consensus", reset=F, cutoff=.51)

Step 4a - Examine Results

The output generated by each of these functions is a list containing different pieces of information (i.e., elements). We can use the names function to see the labels assigned to each element in the list, before examining each of the elements. In addition, the previous code automatically generates histograms plotting the distribution of within-group agreement statistics.

names(fit1)

We now examine each element in fit1. The first element provides a descriptive statistics summary of the statistics used to help justify data aggregation. From fit1, we see, most groups (on average) had a relatively modest level of agreement on the LEAD06 item. Using a uniform null distribution, the mean rWG value was .51. Not surprisingly, this mean value decreased when examining agreement in the presence of potential response biases. In addition, as we shrink the size of the null error variance used to estimate rWG (i.e., by using estimates that account for various response biases), we are more likely to obtain out-of-range estimates of rWG (i.e., rWG < 0.00).

fit1$rwg.descriptives

The next element in fit1 summarizes the proportion of groups with rWG values exceeding the user-specified cutoff (rWG>0.51 in our example). Given that this analysis is for a single item (LEAD06), it is not surprising to see only 59% of the groups had rwg.un values greater than our cutoff.

fit1$rwg.over.cutoff

The next element in fit1 summaries the distribution of rWG values, broken into percentiles. From this distribution, we get a better picture of how agreement varies across the 49 groups. Specifically, examining the distribution of rwg.un estimates, we see that few groups (~10%) had values of rWG that were indicative of "strong agreement" (i.e., > 0.70 using the heuristics recommended by LeBreton and Senter (2008)). Specifically, the 90th percentile for rwg.un was only 0.72.

fit1$rwg.percentiles

The next element summarizes how many of the groups had out-of-range rWG values (i.e., rWG < 0.00) and clearly states whether the researcher requested that out-of-range values be reset to zero or retained in the analysis. Not surprisingly, when rWG was estimated using heavily skewed null distributions or those reflecting a strong central tendency bias, more groups had negative estimates of agreement. The most likely explanation for these negative values that these distributions are not appropriate for this particular analysis. As noted earlier, we are focusing our attention on the estimates of rWG computed using the uniform null and the slightly skewed null. For both of those distributions, none of the estimates of rWG were out-of-range.

fit1$rwg.out.of.range

The next element clarifies the number of scale points on our item and lists the specific point-estimates for the null error variances (based on different null distributions) that were used to compute estimates of rWG.

fit1$rwg.error.variances

The next element is a data frame containing information about group sizes and the rWG estimates for each group. This information may be easily merged with the original data.

fit1$rwg.results
lq2002 <- merge(x = lq2002, y = fit1$rwg.results, by.x = "COMPID", by.y = "grp.name")
names(lq2002)

The next element, rwg.plots is empty. However, when the RWG function was executed, it automatically generated a set of histograms that could be used to visually depict the distribution of agreement, conditional on different response biases (e.g., slightly skewed error null distribution is used to account for a slight leniency or severity bias; see output immediately following the code that generated fit1).

fit1$rwg.plots

The final element, rwg.p contains the overall or omnibus estimates of rWG based on using a pooled, within-groups estimate of variance. The advantage of rwgp is that is furnishes a single estimate of agreement for an entire data set rather than separate estimates of agreement for each group. Thus, rwgp allows researchers to get a sense of overall agreement based on the pooling of within-group variances. Based on the results, we see that using a uniform null, our estimate of rwgp.un = 0.50. This this value drops considerably when using a slightly skewed distribution to rwgp.ss = 0.25.

Overall, our results indicate that, on average, there is relative week agreement within groups. Examining the fit1$rwg.descriptives and rwg.results indicates that there were some groups with moderate to strong agreement, but many groups with week agreement. Thus, we would likely conclude that there is not sufficient agreement to warrant aggregating LEAD06 to the group level and treating it as an indicator for a group-level construct.

fit1$rwg.p

Step 3b - Estimate Agreement on a Multi-Item Scale using rWG(J)

Next we demonstrate how to estimate within-group agreement for multiple item scales. A separate function is called to estimate rWG(J). The following code is used to estimate within-group agreement for the 11 items measuring soldier perceptions of leadership climate. These items are denoted "LEAD01" to "LEAD11" and occupy columns 3 to 13 in the lq2002 data frame. We have also changed the reset argument to request that any out-of-range estimates of rWG(J) be reset to zero.

fit2<-RWGJ(x=lq2002[,c(3:13)], grpid=lq2002$COMPID, scale=c(1,5), model="Consensus", reset=T, cutoff=0.70)

Step 4b - Examine Results

We can examine individual elements in the results list or we can simply review all of the results for a particular analysis. For example, to review the results for rWG(J), we would just type fit2 into the console and hit 'enter'.

fit2

WGA: An Integrated Function for Estimating Within-Group Agreement

Although some users may prefer to estimate agreement using the separate functions described in the previous section, we recommend using the WGA function, which is a wrapper function that combines the previous analyses into a single set of output. The WGA function also computes additional estimates of agreement (e.g., AD, aWG) which may be of interest to some users.

Below is the code for estimating agreement on the LEAD06 item using the WGA function.

fit3 <- WGA(x=lq2002[,c("LEAD01")], grpid=lq2002$COMPID, scale=c(1,5), 
            model="Consensus", reset=T, cutoff=0.70)
fit3

Aggregating the Data

If the results of the within-group agreement analyses justify aggregating lower-level scores to a higher level, researchers may use a new function called AGGREGATE.DATA. This function expands the aggregate function that is part of the stats package. The aggregate function requires researchers to first aggregate lower-level data to a higher level, then rename variables, and finally merge the lower-level and higher-level data sets. This new function integrates all of those separate steps. We illustrate this function using the scale scores on the 11-item leadership climate questionnaire, denoted LEAD in the lq2002 data. Options for the aggregation statistic include: mean, median, var, sd, min, or max.

AGGREGATE.DATA(data = lq2002, grpid = "COMPID", x = c("LEAD"), aggr.stat = "mean")
names(df.combined)
head(df.combined)


james-lebreton/WGA documentation built on Oct. 17, 2022, 4:05 a.m.