library(learnr)
library(printr)
knitr::opts_chunk$set(echo = TRUE, eval = TRUE)

Multivariate Statistics

Introduction

We begin this laboratory by pointing to some useful resources regarding multivariate statistics:

Before beginning the analysis, we load the required libraries.

library(plotly)
library(corrplot)
library(RColorBrewer)
library(gclus)
library(hexbin)
library(scatterplot3d)

MTCARS data

The data was extracted from the 1974 Motor Trend US magazine. It comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).

Loading data

data(mtcars)
mtcars
attach(mtcars)

Pairs

We plot the data.

plot(mtcars)

There too many variables! We try to select the interesting ones.

plot(mtcars[1:7])

Can you see any kind of dependence? Among which variables?

3d scatter plot

We also plot a 3d scatter plot.

p <- plot_ly(mtcars, x = ~mpg, y = ~wt, z = ~qsec) %>%
  add_markers() %>%
  layout(scene = list(xaxis = list(title = 'mpg'),
                     yaxis = list(title = 'wt'),
                     zaxis = list(title = 'qsec')))

p

Star plots

We make a star plots as well.

stars(mtcars[, 1:7],
  key.loc = c(14, 2),
  main = "Motor Trend Cars : stars(*, full = F)", full = FALSE
)
stars(mtcars[, 1:7],
  key.loc = c(14, 1.5),
  main = "Motor Trend Cars : full stars()", flip.labels = FALSE
)

Location and dispersion indices

We have the mean vector,

colMeans(mtcars)
apply(mtcars,2,mean)

We also compute the Variance-Covariance Matrix,

var(mtcars)

the correlation Matrix (with each entry within $[-1, 1]$),

cor(mtcars)

and we round it to make it easier to read it.

round(cor(mtcars), 2) 

Scatterplot highlighting correlations

We now make a corrplot: positive correlations are displayed in blue and negative correlations in red. Colour intensity and the size of the circle are proportional to the correlation coefficients. In the right side of the correlogram, the legend color shows the correlation coefficients and the corresponding colors.

M <- cor(mtcars)
corrplot(M, type = "upper", order = "hclust", col = brewer.pal(n = 8, name = "RdYlBu"))

Notice that we can customise the plot.

corrplot(M, method = "circle", type = "upper")
corrplot(M, method = "pie", type = "upper")
corrplot(M, method = "ellipse", type = "upper") # nice
corrplot(M, method = "color", type = "upper")
corrplot(M, method = "number", type = "upper")

Heatmap

We now visualise using a heatmap and the display of a clustering tree.

col <- colorRampPalette(c("blue", "white", "red"))(20)
heatmap(x = M, col = col, symm = TRUE)

Reduce data set

We now consider only a few variables.

(data <- mtcars[, c(1, 3, 5, 6)])

We find the mean vector,

colMeans(data)

the variance matrix

var(data)
cor(data)

Basic Scatterplot Matrix

We make a basic scatterplot matrix.

pairs(data, main = "Simple Scatterplot Matrix")
data.r <- abs(cor(data)) # get absolute values of correlations
data.col <- dmat.color(data.r) # get colors
cpairs(data, panel.colors = data.col, gap = .5, main = "Variables Ordered and Colored by Correlation")

3d Plots

We make some 3d plots! One is interactive:

p <- plot_ly(mtcars, x = ~mpg, y = ~wt, z = ~qsec) %>%
  add_markers() %>%
  layout(scene = list(xaxis = list(title = 'mpg'),
                     yaxis = list(title = 'wt'),
                     zaxis = list(title = 'qsec')))

p

And here we project on a plane

scatterplot3d(wt, disp, mpg, main = "3D Scatterplot")

and add some colour.

scatterplot3d(wt, disp, mpg, pch = 16, highlight.3d = TRUE, type = "h", main = "3D Scatterplot")

High Density Scatterplot with Binning

We now simulate some data.

x <- rnorm(1000)
y <- rnorm(1000)

We make some bins: we count the points falling in each occupied cell.

bin <- hexbin(x, y, xbins = 50)
summary(bin)

And plot it!

plot(bin, main = "Hexagonal Binning")

Back to MTCARS data

We plot the data using the bins.

plot(data$mpg, data$wt, xlab = "mpg", ylab = "wt", main = "Scatterplot")
bin <- hexbin(data$mpg, data$wt, xbins = 10, xlab = "mpg", ylab = "wt")
plot(bin)

Exercise

We are going to use the wine.txt file.

We first collect the data.

wine <- moxier::wine
wine <- wine[, -1]
colnames(wine) <- c(
  "ID", "Alcohol", "Malic acid", "Ash", "Alcalinity of ash", "Magnesium",
  "Total phenols", "Flavanoids", "Nonflavanoid phenols", "Proanthocyanins",
  "Color intensity", "Hue", "OD280/OD315", "Proline"
)
wine

Describe the data and give suitable visual representation of the variables contained in the dataset. Select the relevant variables.


pairs(wine[2:14]) # Is this useful?
pairs(wine[2:6])

Compute the main location and dispersion indices.


apply(wine[2:14],2,mean)  
var(wine)
cor(wine)

Select a couple of continuous and/or categorical variables. Analyse them separaetely, as in the univariate case we have gone through in the previous lessons.




mascaretti/moxier documentation built on March 17, 2020, 3:57 p.m.