granovagg.1w: Elemental Graphic Display for One-Way ANOVA

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Graphic to display data for a one-way analysis of variance – that is for unstructured groups. Also to help understand how data play out in the context of the basic one-way model, how the F statistic is generated for the data at hand, etc. The graphic may be called 'elemental' or 'natural' because it is built upon the central question that drives one-way ANOVA (see details below).

Usage

1
2
3
4
granovagg.1w(data, group = NULL, h.rng = 1, v.rng = 1, jj = NULL,
  dg = 2, resid = FALSE, print.squares = TRUE, xlab = "default_x_label",
  ylab = "default_y_label", main = "default_granova_title",
  plot.theme = "theme_granova_1w", ...)

Arguments

data

Dataframe or vector. If a dataframe, the two or more columns are taken to be groups of equal size (whence group is NULL). If data is a vector, group must be a vector, perhaps a factor, that indicates groups (unequal group sizes allowed with this option).

group

Group indicator, generally a factor in case data is a vector.

h.rng

Numeric; controls the horizontal spread of groups, default = 1

v.rng

Numeric; controls the vertical spread of points, default = 1.

jj

Numeric; sets horiz. jittering level of points. jj gets passed as the amount parameter to jitter. When jj = NULL (the default behavior), the degree of jitter will take on a sensible value. In addition, if pairs of ordered means are close to one another and jj = NULL, the degree of jitter will default to the smallest difference between two adjacent contrasts.

dg

Numeric; sets number of decimal points in output display, default = 2

resid

Logical; displays marginal distribution of residuals (as a 'rug') on right side (wrt grand mean), default = FALSE.

print.squares

Logical; displays graphical squares for visualizing the F-statistic as a ratio of MS-between to MS-within

xlab

Character; horizontal axis label, can be supplied by user, default = "default_x_label", which leads to a generic x-axis label ("Contrast coefficients based on group means").

ylab

Character; vertical axis label, can be supplied by user, default = "default_y_label", which leads to a generic y-axis label ("Dependent variable (response)").

main

Character; main label, top of graphic; can be supplied by user, default = "default_granova_title", which will print a generic title for graphic.

plot.theme

argument indicating a ggplot2 theme to apply to the graphic; defaults to a customized theme created for the one-way graphic

...

Optional arguments to/from other functions

Details

The one-way ANOVA graphic shows how the comparison of unstructured groups, viz. their means, entails a particular linear combination (L.C.) of the group means. In particular, we use the fact that the numerator of the one-way F statistic, the mean square between (MS.B), is a linear combination of the group means; each weight – one for each group – in the L.C. is (principally) a function of the difference between the group's mean and the grand mean, viz., (M_j - M..) where M_j denotes the jth group's mean, and M.. denotes the grand mean. The L.C. can be written as a sum of products of the form MS.B = Sum((1/df.B)(n_j (M_j - M..) M_j)) for j = 1...J. The denominator of the F-statistic, MS.W (mean square within), can be described as a 'scaling factor'. It is just the (weighted) average of the variances of the J groups (j = 1 ... J). (n_j's are group sizes.) The differences (M_j - M..) are themselves the 'effects' in the analysis. When the effects are plotted against the group means (the horizontal and vertical axes) a straight line necessarily ensues. Group means are plotted as triangles along this line. Once the means have been plotted, the data points (jittered) for the groups are displayed (vertical axis) with respect to the respective contrasts. Since the group means are just the fitted values in one-way ANOVA, and the deviations of the scores within groups are the residuals (subsetted by groups), the graphic can be seen as showing fitted vs. residual values for the line that shows the locus of ordered group means – from the smallest on the left) the the largest (on the right). If desired, the aggregate of all such residuals can be plotted (as a rug plot) on the right margin of the graphic centered on the grand mean (large green dot in 'middle'). The use of effects to locate groups this way yields what we term an 'elemental' graphic because it is based on the central question that drives one-way ANOVA.

Note that groups need not have the same size, nor do data need to reflect any particular distributional characteristics. Finally, the gray bars (one for each group) at the bottom of the graphic show the relative sizes of the group standard deviations with referene to the 'average' group s.d. (more precisely, the square root of the MS.W). This 'average' corresponds to the thin white line that runs horizontally across these bars.

Value

Returns a plot object of class ggplot. The function also provides printed output including by-group statistical summaries and information about groups that might be overplotted (if applicable):

group

group names

group means

means for each group

trimmed.mean

20% trimmed group means

contrast

Contrasts (group main effects)

variance

variances

standard.deviation

standard deviations

group.size

group sizes

overplotting information

Information about groups that, due to their close means, may be overplotted

Author(s)

Brian A. Danielak brian@briandk.com
Robert M. Pruzek RMPruzek@yahoo.com

with contributions by:
William E. J. Doane wil@drdoane.com
James E. Helmreich James.Helmreich@Marist.edu
Jason Bryer jason@bryer.org

References

Fundamentals of Exploratory Analysis of Variance, Hoaglin D., Mosteller F. and Tukey J. eds., Wiley, 1991.

Wickham, H. (2009). Ggplot2: Elegant Graphics for Data Analysis. New York: Springer.

Wilkinson, L. (1999). The Grammar of Graphics. Statistics and computing. New York: Springer.

See Also

granovagg.contr, granovagg.ds, granovaGG

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
data(arousal)
#Drug A
granovagg.1w(arousal[,1:2], h.rng = 1.6, v.rng = 0.5)

###

data(anorexia)
wt.gain <- anorexia[, 3] - anorexia[, 2]
granovagg.1w(wt.gain, group = anorexia[, 1])

###

data(poison)
##Note violation of constant variance across groups in following graphic.
granovagg.1w(poison$SurvTime, group = poison$Group, ylab = "Survival Time")
##RateSurvTime = SurvTime^-1
granovagg.1w(poison$RateSurvTime, group = poison$Group,
ylab = "Survival Rate = Inverse of Survival Time")

##Nonparametric version: RateSurvTime ranked and rescaled
##to be comparable to RateSurvTime;
##note labels as well as residual (rug) plot below.
granovagg.1w(poison$RankRateSurvTime, group = poison$Group,
ylab = "Ranked and Centered Survival Rates",
main = "One-way ANOVA display, poison data (ignoring 2-way set-up)",
res = TRUE)

###

data(chickwts)
?chickwts # An explanation of the chickwts dataset
with(chickwts, granovagg.1w(weight, group = feed)) # Modeling weight as explained by feed type

Example output

Loading required package: ggplot2

By-group summary statistics for your input data (ordered by group means)
    group group.mean trimmed.mean contrast variance standard.deviation
1 Placebo      20.43        20.30    -1.92     5.83               2.41
2  Drug.A      24.27        24.45     1.92     7.89               2.81
  group.size
1         10
2         10

Below is a t-test summary of your input data

	Two Sample t-test

data:  unstacked.data[, 1] and unstacked.data[, 2]
t = -3.2786, df = 18, p-value = 0.004174
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -6.300681 -1.379319
sample estimates:
mean of x mean of y 
    20.43     24.27 


By-group summary statistics for your input data (ordered by group means)
  group group.mean trimmed.mean contrast variance standard.deviation group.size
2  Cont      -0.45        -1.16    -3.21    63.82               7.99         26
1   CBT       3.01         1.80     0.24    53.41               7.31         29
3    FT       7.26         7.91     4.50    51.23               7.16         17

Below is a linear model summary of your input data

Call:
lm(formula = score ~ group, data = owp$data)

Residuals:
    Min      1Q  Median      3Q     Max 
-12.565  -4.543  -1.007   3.846  17.893 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)    3.007      1.398   2.151   0.0350 *
groupCont     -3.457      2.033  -1.700   0.0936 .
groupFT        4.258      2.300   1.852   0.0684 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.528 on 69 degrees of freedom
Multiple R-squared:  0.1358,	Adjusted R-squared:  0.1108 
F-statistic: 5.422 on 2 and 69 DF,  p-value: 0.006499


By-group summary statistics for your input data (ordered by group means)
   group group.mean trimmed.mean contrast variance standard.deviation
3      3       0.21         0.21    -0.27     0.00               0.02
9      9       0.24         0.24    -0.24     0.00               0.01
2      2       0.32         0.32    -0.16     0.01               0.08
12    12       0.32         0.32    -0.15     0.00               0.03
6      6       0.34         0.34    -0.14     0.00               0.05
8      8       0.38         0.38    -0.10     0.00               0.06
1      1       0.41         0.41    -0.07     0.00               0.07
7      7       0.57         0.57     0.09     0.02               0.16
10    10       0.61         0.61     0.13     0.01               0.11
11    11       0.67         0.67     0.19     0.07               0.27
5      5       0.82         0.82     0.34     0.11               0.34
4      4       0.88         0.88     0.40     0.03               0.16
   group.size
3           4
9           4
2           4
12          4
6           4
8           4
1           4
7           4
10          4
11          4
5           4
4           4

The following groups are likely to be overplotted
   group group.mean contrast
2      2       0.32    -0.16
12    12       0.32    -0.15
6      6       0.34    -0.14

Below is a linear model summary of your input data

Call:
lm(formula = score ~ group, data = owp$data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.32500 -0.04875  0.00500  0.04312  0.42500 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.41250    0.07457   5.532 2.94e-06 ***
group2      -0.09250    0.10546  -0.877 0.386230    
group3      -0.20250    0.10546  -1.920 0.062781 .  
group4       0.46750    0.10546   4.433 8.37e-05 ***
group5       0.40250    0.10546   3.817 0.000513 ***
group6      -0.07750    0.10546  -0.735 0.467163    
group7       0.15500    0.10546   1.470 0.150304    
group8      -0.03750    0.10546  -0.356 0.724219    
group9      -0.17750    0.10546  -1.683 0.101000    
group10      0.19750    0.10546   1.873 0.069235 .  
group11      0.25500    0.10546   2.418 0.020791 *  
group12     -0.08750    0.10546  -0.830 0.412164    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1491 on 36 degrees of freedom
Multiple R-squared:  0.7335,	Adjusted R-squared:  0.6521 
F-statistic:  9.01 on 11 and 36 DF,  p-value: 1.986e-07


By-group summary statistics for your input data (ordered by group means)
   group group.mean trimmed.mean contrast variance standard.deviation
4      4       1.16         1.16    -1.46     0.04               0.20
5      5       1.39         1.39    -1.23     0.31               0.55
10    10       1.69         1.69    -0.93     0.13               0.36
11    11       1.70         1.70    -0.92     0.49               0.70
7      7       1.86         1.86    -0.76     0.24               0.49
1      1       2.49         2.49    -0.14     0.25               0.50
8      8       2.71         2.71     0.09     0.17               0.42
6      6       3.03         3.03     0.41     0.18               0.42
12    12       3.09         3.09     0.47     0.06               0.24
2      2       3.27         3.27     0.65     0.68               0.82
9      9       4.26         4.26     1.64     0.06               0.23
3      3       4.80         4.80     2.18     0.28               0.53
   group.size
4           4
5           4
10          4
11          4
7           4
1           4
8           4
6           4
12          4
2           4
9           4
3           4

The following groups are likely to be overplotted
   group group.mean contrast
10    10       1.69    -0.93
11    11       1.70    -0.92
6      6       3.03     0.41
12    12       3.09     0.47

Below is a linear model summary of your input data

Call:
lm(formula = score ~ group, data = owp$data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.76848 -0.29639 -0.06915  0.25455  1.07933 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.4869     0.2450  10.151 4.16e-12 ***
group2        0.7816     0.3465   2.256 0.030247 *  
group3        2.3158     0.3465   6.684 8.56e-08 ***
group4       -1.3234     0.3465  -3.820 0.000508 ***
group5       -1.0935     0.3465  -3.156 0.003226 ** 
group6        0.5421     0.3465   1.565 0.126414    
group7       -0.6242     0.3465  -1.801 0.080010 .  
group8        0.2270     0.3465   0.655 0.516468    
group9        1.7781     0.3465   5.132 1.00e-05 ***
group10      -0.7972     0.3465  -2.301 0.027299 *  
group11      -0.7853     0.3465  -2.267 0.029517 *  
group12       0.6049     0.3465   1.746 0.089344 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.49 on 36 degrees of freedom
Multiple R-squared:  0.8681,	Adjusted R-squared:  0.8277 
F-statistic: 21.53 on 11 and 36 DF,  p-value: 1.289e-12


By-group summary statistics for your input data (ordered by group means)
   group group.mean trimmed.mean contrast variance standard.deviation
4      4       1.11         1.11    -1.38     0.03               0.18
5      5       1.36         1.36    -1.13     0.28               0.53
10    10       1.67         1.67    -0.82     0.10               0.31
11    11       1.69         1.69    -0.80     0.50               0.71
7      7       1.82         1.82    -0.67     0.24               0.49
1      1       2.39         2.39    -0.10     0.30               0.55
8      8       2.72         2.72     0.23     0.19               0.44
6      6       3.04         3.04     0.55     0.18               0.42
12    12       3.09         3.09     0.61     0.05               0.22
2      2       3.15         3.15     0.66     0.39               0.62
9      9       3.78         3.78     1.29     0.03               0.16
3      3       4.04         4.04     1.55     0.03               0.16
   group.size
4           4
5           4
10          4
11          4
7           4
1           4
8           4
6           4
12          4
2           4
9           4
3           4

The following groups are likely to be overplotted
   group group.mean contrast
10    10       1.67    -0.82
11    11       1.69    -0.80
6      6       3.04     0.55
12    12       3.09     0.61
2      2       3.15     0.66

Below is a linear model summary of your input data

Call:
lm(formula = score ~ group, data = owp$data)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.7375 -0.2900 -0.0375  0.2606  0.9225 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.3925     0.2195  10.899 5.93e-13 ***
group2        0.7550     0.3105   2.432 0.020121 *  
group3        1.6425     0.3105   5.291 6.16e-06 ***
group4       -1.2825     0.3105  -4.131 0.000205 ***
group5       -1.0300     0.3105  -3.318 0.002083 ** 
group6        0.6475     0.3105   2.086 0.044157 *  
group7       -0.5775     0.3105  -1.860 0.071043 .  
group8        0.3250     0.3105   1.047 0.302141    
group9        1.3900     0.3105   4.477 7.33e-05 ***
group10      -0.7225     0.3105  -2.327 0.025691 *  
group11      -0.7050     0.3105  -2.271 0.029235 *  
group12       0.7025     0.3105   2.263 0.029775 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.439 on 36 degrees of freedom
Multiple R-squared:  0.8542,	Adjusted R-squared:  0.8097 
F-statistic: 19.18 on 11 and 36 DF,  p-value: 7.233e-12

chickwts               package:datasets                R Documentation

_C_h_i_c_k_e_n _W_e_i_g_h_t_s _b_y _F_e_e_d _T_y_p_e

_D_e_s_c_r_i_p_t_i_o_n:

     An experiment was conducted to measure and compare the
     effectiveness of various feed supplements on the growth rate of
     chickens.

_U_s_a_g_e:

     chickwts
     
_F_o_r_m_a_t:

     A data frame with 71 observations on the following 2 variables.

     'weight' a numeric variable giving the chick weight.

     'feed' a factor giving the feed type.

_D_e_t_a_i_l_s:

     Newly hatched chicks were randomly allocated into six groups, and
     each group was given a different feed supplement.  Their weights
     in grams after six weeks are given along with feed types.

_S_o_u_r_c_e:

     Anonymous (1948) _Biometrika_, *35*, 214.

_R_e_f_e_r_e_n_c_e_s:

     McNeil, D. R. (1977) _Interactive Data Analysis_.  New York:
     Wiley.

_E_x_a_m_p_l_e_s:

     require(stats); require(graphics)
     boxplot(weight ~ feed, data = chickwts, col = "lightgray",
         varwidth = TRUE, notch = TRUE, main = "chickwt data",
         ylab = "Weight at six weeks (gm)")
     anova(fm1 <- lm(weight ~ feed, data = chickwts))
     opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0),
                 mar = c(4.1, 4.1, 2.1, 1.1))
     plot(fm1)
     par(opar)
     


By-group summary statistics for your input data (ordered by group means)
      group group.mean trimmed.mean contrast variance standard.deviation
2 horsebean     160.20       154.33  -101.11  1491.96              38.63
3   linseed     218.75       219.50   -42.56  2728.57              52.24
5   soybean     246.43       246.50   -14.88  2929.96              54.13
4  meatmeal     276.91       280.43    15.60  4212.09              64.90
1    casein     323.58       331.38    62.27  4151.72              64.43
6 sunflower     328.92       326.38    67.61  2384.99              48.84
  group.size
2         10
3         12
5         14
4         11
1         12
6         12

Below is a linear model summary of your input data

Call:
lm(formula = score ~ group, data = owp$data)

Residuals:
     Min       1Q   Median       3Q      Max 
-123.909  -34.413    1.571   38.170  103.091 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     323.583     15.834  20.436  < 2e-16 ***
grouphorsebean -163.383     23.485  -6.957 2.07e-09 ***
grouplinseed   -104.833     22.393  -4.682 1.49e-05 ***
groupmeatmeal   -46.674     22.896  -2.039 0.045567 *  
groupsoybean    -77.155     21.578  -3.576 0.000665 ***
groupsunflower    5.333     22.393   0.238 0.812495    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 54.85 on 65 degrees of freedom
Multiple R-squared:  0.5417,	Adjusted R-squared:  0.5064 
F-statistic: 15.36 on 5 and 65 DF,  p-value: 5.936e-10

granovaGG documentation built on May 2, 2019, 2:09 a.m.