Many times in an analysis, multiple variables in the data will be measuring the same quantity. For example, in the `mri`

data available at Scott Emerson's website and documented on the same page, both the `yrsquit`

and `packyrs`

variables measure the amount of smoking that a person does.

To fully analyze these variables, we need to run multiple-partial F-tests. Prior to the `uwIntroStats`

package, the process to perform these tests involved more code than was necessary. First the user had to create a linear model (or perhaps multiple linear models), and then run an ANOVA test.

Now, using the `U()`

function, the user can specify multiple-partial F-tests within a call to `regress()`

, the regression function supplied by `uwIntroStats`

. A full explanation of that function can be found in "Regression in uwIntroStats".

This document provides an introduction to using the `U()`

function as a supplement to regression analyses. In each case, we will use linear regression to avoid confusion, and leave all of the arguments to `regress()`

up to its own vignette.

`U()`

functionTo continue our example above, if we want to describe the association between cerebral atrophy and smoking and age using linear regression, we would have to use both the `yrsquit`

and `packyrs`

variables, in addition to the `age`

variable. But as we already described, the former two both measure smoking habits, and thus are truly one variable.

The `U()`

function only requires a formula when it is used to create a multiple-partial F-test. However, this is not a usual formula, because the response variable has already been defined in the outer formula in the call to `regress()`

. For example, the formula given to `regress()`

without the multiple-partial F-test would follow the usual convention of `lm()`

.

atrophy ~ age + packyrs + yrsquit

Now if we want to make the F-test, we give `U()`

the formula

~ packyrs + yrsquit

and it knows to use the response variable `atrophy`

. In fact, an error will be returned if a response variable is entered to the `U()`

formula.

Now we can run the regression.

library(uwIntroStats) data(mri) regress("mean", atrophy ~ age + U(~packyrs + yrsquit), data = mri)

The regression output indicates that the variable for smoking should be in the model. The F-statistic for the multiple-partial F-test, which tests that the `packyrs`

and `yrsquit`

coefficient estimates are simultaneously equal to zero, is 4.37 with a p-value of less than 0.05. Thus we would conclude that both age and smoking are associated with cerebral atrophy. For a full example of the inference we would make from this model, see the vignette for using `regress()`

.

`U()`

In our example above, we stated that both variables were actually measuring smoking habits. Thus in our regression call we could name this group to have more informative output. The `U()`

function allows us to name the groups by placing an "=" before the tilde in the formula, and assigning a name on the left. In our example above, we could name the group "smoke" by writing

U(smoke = ~packyrs + yrsquit)

This would return the following output.

regress("mean", atrophy ~ age + U(smoke = ~packyrs + yrsquit), data = mri)

This is more informative than above, because now we are immediately reminded that `yrsquit`

and `packyrs`

are measuring smoking history when we look at the output.

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.