Mean Inference
In lessR: Less Code, More Results

suppressPackageStartupMessages(library("lessR"))

First read the Employee data included as part of lessR.

d <- Read("Employee")

As an option, also read the table of variable labels. Create the table formatted as two columns. The first column is the variable name and the second column is the corresponding variable label. Not all variables need to be entered into the table. The table can be a csv file or an Excel file.

Currently, read the label file into the l data frame. The labels are displayed on both the text and visualization output. Each displayed label consists of the variable name juxtaposed with the corresponding label.

l <- rd("Employee_lbl")

One-Sample t-test

Obtain the summary statistics and 95% confidence interval for a single variable by specifying that variable with ttest(). Because d is the default name of the data frame that contains the variables for analysis, the data parameter that names the input data frame need not be specified.

ttest(Salary)

Add a hypothesis test to the above.

ttest(Salary, mu=52000)

Analysis of the above from summary statistics only.

ttest(n=37, m=73795.557, s=21799.533, Ynm="Salary", mu=52000)

Two-Samples t-test

Independent Groups

Full analysis with ttest() function, abbreviated as tt(), with formula mode.

ttest(Salary ~ Gender)

Brief version of the output contains just the basics.

tt_brief(Salary ~ Gender)

Dependent Groups

tt_brief(Pre, Post)

ANOVA

Analysis of variance applies to the inferential analysis of means across groups. The lessR function ANOVA(), abbreviated av(), provides this analysis, based on the base R function aov().

The data for these examples is the warpbreaks data set included with the R datasets package. The data are from a weaving device called a loom for a fixed length of yarn. The response variable is the number of times the yarn broke during the weaving. Independent variables are the type of wool -- A or B --and the level of tension -- L, M, or H.

Because warpbreaks is not the default data frame, specify with the data parameter (or set d equal to warpbreaks).

One-way Independent Groups

First, for illustrative purposes, ignore the type of wool and only examine the impact of tension on breaks.

The output includes descriptive statistics, ANOVA table, effect size indices, Tukey's multiple comparisons of means, and residuals, as well as the scatterplot of the response variable with the levels of the independent variable, and a visualization of the mean comparisons.

ANOVA(breaks ~ tension, data=warpbreaks)

The brief version forgoes the multiple comparisons and the residuals.

av_brief(breaks ~ tension, data=warpbreaks)

Two-way Independent Groups

Specify the second independent variable preceded by a * sign. The plot of the cell means is generated automatically.

ANOVA(breaks ~ tension * wool, data=warpbreaks)

Can also obtain the cell mean plot directly from the means. Here use lessR pivot() to compute the cell means of breaks across tension and wool.

data(warpbreaks)
dm <- pivot(warpbreaks, mean, breaks, c(tension, wool))
dm

Store the aggregated data in the data frame named dm, so explicitly identify with the data parameter. The computed mean of breaks variable in the dm data frame from the previous call to pivot() is named mean by default.

Plot(tension, breaks_mean, by=wool, segments=TRUE, size=2, data=dm,
     main="Cell Means")

Randomized Block Design

The randomized block design has a treatment variable, usually administered over time, and a blocking variable. The values of the treatment variable are measured across each instance of the blocking variable. In this example, repetitions are measured across four different workout sessions. The person takes one of four supplements before each session. Person is the blocking variable, and Supplement is the treatment variable. Repetitions is the response variable.

The data are presented in a wide-form data table, a single row for each person.

d <- read.csv(header=TRUE, text="
Person,sup1,sup2,sup3,sup4
p1,2,4,4,3
p2,2,5,4,6
p3,8,6,7,9
p4,4,3,5,7
p5,2,1,2,3
p6,5,5,6,8
p7,2,3,2,4")

The ANOVA, however, requires data to be in long-form. Reshape data from wide form to long form with base R reshape() according to the following parameters. With each parameter, either identify existing variables in the given wide-form data, or name newly created variables in the long-form. This R function refers to a time-variable, which in the context of ANOVA is the treatment variable, of which the values occur over time: first treatment, second treatment, etc.

The reshaping from a wide-form to a long-form data table creates two new variables: the variable whose values are collected over time, here the blocking variable, Supplement, and the response variable, here Reps.

idvar: Identify the existing blocking (within) variable in the wide-form data
varying: Identify the wide-form variables, which occur over time, to be gathered into a single variable in long-format
timevar: Name the corresponding treatment (factor) variable in the created long-form
v.names: Name the response variable in the created long-form

There are many ways to identify the names of the wide-form variables to be gathered into a single time-oriented long-form variable. The most general is to specify a vector of the names, here

c("sup1", "sup2" "sup3", "sup4")

In this example use the lessR to function to create that vector without needed to individually list each variable.

to("sup", 4)

d <- reshape(d, direction="long",
        idvar="Person", varying=list(to("sup", 4)), 
        timevar="Supplement", v.names="Reps")

Do not need the row names, so remove before displaying new long-form data.

row.names(d) <- NULL
d[1:10,]

To run the ANOVA, specify the blocking variable preceded by a + sign.

ANOVA(Reps ~ Supplement + Person)

Full Manual

Use the base R help() function to view the full manual for ttest() or ANOVA(). Simply enter a question mark followed by the name of the function.