# rrr for Multivariate Regression In rrr: Reduced-Rank Regression

library(rrr)


## Classical Multivariate Regression

Let $\mathbf{X} = \left(X_1, X_2, \dots, X_r\right)^\tau$ and $\mathbf{Y} = \left(Y_1, Y_2, \dots, Y_s\right)^\tau$, i.e., $\mathbf{X}$ is a random vector. The classical multivariate regression model is given by

$$\overset{s \times 1}{\mathbf{Y}} = \overset{s \times 1}{\boldsymbol{\mu}} + \overset{s \times r}{\mathbf{C}} \; \overset{r \times 1}{\mathbf{X}} + \overset{s \times 1}{\varepsilon}$$

with

$$\mathrm{E}\left(\varepsilon\right) = \mathbf{0}, \quad \mathrm{cov}\left(\varepsilon\right) = \mathbf{\Sigma}_{\varepsilon \varepsilon}$$

and $\varepsilon$ is distributed independently of $\mathbf{X}.$

To estimate $\boldsymbol{\mu}$ and $\mathbf{C}$ we minimize the least-squares criterion

$$\mathrm{E}\left[\left(\mathbf{Y} - \boldsymbol{\mu} - \mathbf{C} \mathbf{X}\right)\left(\mathbf{Y} - \boldsymbol{\mu} - \mathbf{C}\mathbf{X}\right)^\tau\right],$$

with expecation taken over the joint distribution of $\left(\mathbf{X}^\tau, \mathbf{Y}^\tau\right)$, with the assumption that $\mathbf{\Sigma}_{XX}$ is nonsingular, and therefore invertible.

This is minimized when

\begin{aligned} \boldsymbol{\mu} & = \boldsymbol{\mu}Y - \mathbf{C} \boldsymbol{\mu} \ \mathbf{C} & = \mathbf{\Sigma}{YX} \mathbf{\Sigma}_{XX}^{-1} \end{aligned}

The least-squares estimator of $\mathbf{C}$ is given by

$$\hat{\mathbf{C}} = \hat{\mathbf{\Sigma}}{YX} \hat{\mathbf{\Sigma}}{XX}^{-1}$$

Note that $\mathbf{C}$ and hence $\hat{\mathbf{C}}$ contains no term that takes into the account the correlation of the $Y_i$s. This is a surprising result, since we would expect correlation among the responses.

In other words, to find the least-squares estimate $\hat{\mathbf{C}}$ of $\mathbf{C}$, one need only regress $\mathbf{X}$ separately on each $Y_i$ and concatenate those multiple-regression coefficient vectors into a matrix to construct the estimated coefficient matrix $\hat{\mathbf{C}}$.

The classical multivariate regression model is not truly multivariate.

## The tobacco Data Set

library(dplyr)
data(tobacco)

tobacco <- as_data_frame(tobacco)

glimpse(tobacco)


We see that the tobacco data set^[Anderson, R.L and Bancroft, T.A (1952). Statistical Theory in Research, New York: McGraw-Hill. p. 205. ] has 9 variables and 25 observations. There are 6 $X_i$ predictor variables -- representing the percentages of nitrogen, chlorine, potassium, phosphorus, calcium, and magnesium, respectively -- and 3 $Y_j$ response variables -- representing cigarette burn rates in inches per 1,000 seconds, percent sugar in the leaf, and percent nicotine in the leaf, respectively.

tobacco_x <- tobacco %>%
select(starts_with("X"))

tobacco_y <- tobacco %>%
select(starts_with("Y"))


Below we see that there is not only correlation among the $X_i$s but also among the $Y_i$s. The classical multivariate will not capture that information.

We can get a good visual look at the correlation structure using GGally::ggcorr. GGally is a package that extends the functionality of the package ggplot2 and has been utilized in rrr.

GGally::ggcorr(tobacco_x)

GGally::ggcorr(tobacco_y)

## multivariate regression

x <- as.matrix(tobacco_x)
y <- as.matrix(tobacco_y)

multivar_reg <- t(cov(y, x) %*% solve(cov(x)))

## separate multiple regression

lm1 <- lm(y[,1] ~ x)$coeff lm2 <- lm(y[,2] ~ x)$coeff


### Estimate $t$ and ridge constant $k$ with rank_trace()

rank_trace(digits_features, digits_features, type = "pca")


Print data frame of rank trace coordinates by setting plot = FALSE.

rank_trace(digits_features, digits_features, type = "pca", plot = FALSE)


### Pairwise Plots with pairwise_plot()

args(pairwise_plot)


A common PCA method of visualization for diagnostic and analysis purposes is to plot the $j$th sample PC scores against the $k$th PC scores,

\begin{aligned} \left(\xi_{ij}, \xi_{ik}\right) & \ = \left(\hat{\mathbf{v}}_j^\tau \mathbf{X}_i, \hat{\mathbf{v}}_k^\tau \mathbf{X}_i\right)&, \quad i = 1,2, \dots, n \end{aligned}

Since the first two principal components will capture the most variance -- and hence the most useful information -- of all possible pairs of principal components, we typically would set $j = 1, k = 2$ and plot the first two sample PC scores against each other. In rrr this is the default.

pairwise_plot(digits_features, digits_class, type = "pca")


We can set the $x$- and $y$-axes to whichever pairs of PC scores we would like to plot by changing the pc_x and pc_y arguments.

pairwise_plot(digits_features, digits_class, type = "pca", pair_x = 1, pair_y = 3)


### Plot all pairs of PC scores with pca_allpairs_plot()

Alternatively, we can look at structure in the data by plotting all PC pairs, along with some other visual diagnostics with pca_allpairs_plot(). Along with plotting principal component scores against each other, the plot matrix also shows histograms and box plots to show how the points are distributed along principal component axes.

#args(pca_allpairs_plot)

#pca_allpairs_plot(digits_features, rank = 3, class_labels = digits_class)


### Fit model with rrr()

rrr(digits_features, digits_features, type = "pca", rank  = 3)


## Canonical Variate Analysis

### CVA as a Special Case of Reduced-Rank Regression

Canonical Variate Analysis^[Hotelling, H. (1936). Relations between two sets of variates, Biometrika, 28, 321-377.] is a method of linear dimensionality reduction.

It is assumed that $\left(\mathbf{X}, \mathbf{Y}\right)$ are jointly distributed with

$$\mathrm{E}\left{ \begin{pmatrix} \mathbf{X}\ \mathbf{Y}\ \end{pmatrix} \right} = \begin{pmatrix} \boldsymbol{\mu}X \ \bodlsybmol{\mu}_Y \ \end{pmatrix}, \quad \mathrm{cov}\left{ \begin{pmatrix} \mathbf{X}\ \mathbf{Y}\ \end{pmatrix} \right} = \begin{pmatrix} \mathbf{\Sigma}{XX} & \mathbf{\Sigma}{XY} \ \mathbf{\Sigma}{YX} & \mathbf{\Sigma}_{YY} \ \end{pmatrix}$$

The $t$ new pairs of canonical variables $\left(\xi_i, \omega_i\right), i = 1, \dots, t$ are calculated by fitting a reduced rank regression equation. The canonical variate scores are given by

$$\mathbf{\xi}^{\left(t\right)} = \mathbf{G}^{\left(t\right)}\mathbf{X}, \quad \mathbf{\omega}^{\left(t\right)} = \mathbf{H}^{\left(t\right)} \mathbf{Y},$$

with

$$\mathbf{\Gamma} & = \mathbf{\Sigma}_{YY}^{-1} \ \mathbf{G}^{\left(t\right)} & = \mathbf{B}^{\left(t\right)} \ \mathbf{H}^{\left(t\right)} & = \mathbf{A}^{\left(t\right)-} \$$

where $\mathbf{A}^{\left(t\right)}, \mathbf{B}^{\left(t\right)}$ are the matrices from the reduced-rank regression formulation above.

Note that $\mathbf{H}^{\left(t\right)} = \mathbf{A}^{\left(t\right)-}$ is the generalized inverse of $\mathbf{A}^{\left(t\right)}$. When $t = s, \mathbf{H}^{\left(s\right)} = \mathbf{A}^{\left(t\right)+}$ is the unique Moore-Penrose generalized inverse of $\mathbf{A}^{\left(t\right)}$.

## The COMBO17 Data Set

### COMBO-17 galaxy data
data(COMBO17)
galaxy <- as_data_frame(COMBO17) %>%
select(-starts_with("e."), -Nr, -UFS:-IFD) %>%
na.omit()

glimpse(galaxy)

galaxy_x <- galaxy %>%
select(-Rmag:-chi2red)

galaxy_y <- galaxy %>%
select(Rmag:chi2red)

GGally::ggcorr(galaxy_x)

GGally::ggcorr(galaxy_y)


### Assessing Effective Dimensionality

Estimate $t$ and $k$ with rank_trace()

rank_trace(galaxy_x, galaxy_y, type = "cva")


### Diagnostics

Calculate residuals with residuals(), setting type = "cva".

residuals(galaxy_x, galaxy_y, type = "cva", rank = 2, k = 0.001, plot = FALSE)


### Plot Residuals

Plot residuals with residuals, setting type = "cva"

residuals(galaxy_x, galaxy_y, type = "cva", rank = 2, k = 0.001)


### Plot Pairwise Canonical Variate Scores with pairwise_plot()

pairwise_plot(galaxy_x, galaxy_y, type = "cva", pair_x = 1, k = 0.0001)
pairwise_plot(galaxy_x, galaxy_y, type = "cva", pair_x = 2, k = 0.0001)

pairwise_plot(galaxy_x, galaxy_y, type = "cva", pair_x = 3)
pairwise_plot(galaxy_x, galaxy_y, type = "cva", pair_x = 6)


### Fit Reduced-Rank Canonical Variate Model

Fit model with rrr(), setting type = "cva".

rrr(galaxy_x, galaxy_y, type = "cva", rank = 2, k = 0.0001)


## Linear Discriminant Analysis

### LDA as a Special Case of CVA

Linear discriminant analysis is a classification procedure. We can turn it into a regression procedure -- specifically a reduced-rank canonical variate procedure -- in the following way.

Let each $i = 1, 2, \dots, n$ observation belong to one, and only one, of $K = s + 1$ distinct classes.

We can construct an indicator response matrix, $\mathbf{Y}$ where each row $i$ is an indicator response vector for the $i$th observation. The vector will have a 1 in the column that represents that class to which the observation belongs and will be 0 elsewhere.

We then regress this $Y$ binary response matrix against the matrix $X$ of predictor variables.

Linear discriminant analysis requires the assumptions that each class is normally distributed and that the covariance matrix of each class is equal to all others.

While these assumptions will not be met in all cases, when they are -- and when the classes are well separated -- linear discriminant analysis is a very efficient classification method.

## The iris Data Set

data(iris)
iris <- as_data_frame(iris)

glimpse(iris)

iris_features <- iris %>%
select(-Species)

iris_class <- iris %>%
select(Species)


### Assesssing Effective Dimensionality

Assessing the rank $t$ of this reduced-rank regression is equivalent to determining the number of linear discriminant functions that best discriminate between the $K$ classes, with $\mathrm{min}\left(r, s\right) = \mathrm{min}\left(r, K - 1\right)$ maximum number of linear discriminant functions.

Generally, plotting linear discriminant functions against each other, i.e., the first and second linear discriminant functions, is used to determine whether sufficient discrimination is obtained.

Plotting techniques are discussed in the following section.

### Plotting

Plot LDA Pairs with pairwise_plot(), setting type = "pca".

A typical graphical display for multiclass LDA is to plot the $j$th discriminant scores for the $n$ points against the $k$ discriminant scores.

pairwise_plot(iris_features, iris_class, type = "lda", k = 0.0001)


### Fitting LDA Models

Fit LDA model with rrr(), setting type = "lda".

rrr(iris_features, iris_class, type = "lda", k = 0.0001)


### Calculate LDA Scores with scores()

scores(iris_features, iris_class, type = "lda", k = 0.0001)


## Try the rrr package in your browser

Any scripts or data that you put into this service are public.

rrr documentation built on May 1, 2019, 9:16 p.m.