library(learnr) library(printr) knitr::opts_chunk$set(echo = TRUE, eval = TRUE)
We begin this laboratory by pointing to some useful resources regarding multivariate statistics:
Before beginning the analysis, we load the required libraries.
library(plotly) library(corrplot) library(RColorBrewer) library(gclus) library(hexbin) library(scatterplot3d)
The data was extracted from the 1974 Motor Trend US magazine. It comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).
data(mtcars) mtcars attach(mtcars)
We plot the data.
plot(mtcars)
There too many variables! We try to select the interesting ones.
plot(mtcars[1:7])
Can you see any kind of dependence? Among which variables?
We also plot a 3d scatter plot.
p <- plot_ly(mtcars, x = ~mpg, y = ~wt, z = ~qsec) %>% add_markers() %>% layout(scene = list(xaxis = list(title = 'mpg'), yaxis = list(title = 'wt'), zaxis = list(title = 'qsec'))) p
We make a star plots as well.
stars(mtcars[, 1:7], key.loc = c(14, 2), main = "Motor Trend Cars : stars(*, full = F)", full = FALSE )
stars(mtcars[, 1:7], key.loc = c(14, 1.5), main = "Motor Trend Cars : full stars()", flip.labels = FALSE )
We have the mean vector,
colMeans(mtcars) apply(mtcars,2,mean)
We also compute the Variance-Covariance Matrix,
var(mtcars)
the correlation Matrix (with each entry within $[-1, 1]$),
cor(mtcars)
and we round it to make it easier to read it.
round(cor(mtcars), 2)
We now make a corrplot: positive correlations are displayed in blue and negative correlations in red. Colour intensity and the size of the circle are proportional to the correlation coefficients. In the right side of the correlogram, the legend color shows the correlation coefficients and the corresponding colors.
M <- cor(mtcars) corrplot(M, type = "upper", order = "hclust", col = brewer.pal(n = 8, name = "RdYlBu"))
Notice that we can customise the plot.
corrplot(M, method = "circle", type = "upper") corrplot(M, method = "pie", type = "upper") corrplot(M, method = "ellipse", type = "upper") # nice corrplot(M, method = "color", type = "upper") corrplot(M, method = "number", type = "upper")
We now visualise using a heatmap and the display of a clustering tree.
col <- colorRampPalette(c("blue", "white", "red"))(20) heatmap(x = M, col = col, symm = TRUE)
We now consider only a few variables.
(data <- mtcars[, c(1, 3, 5, 6)])
We find the mean vector,
colMeans(data)
the variance matrix
var(data)
cor(data)
We make a basic scatterplot matrix.
pairs(data, main = "Simple Scatterplot Matrix")
data.r <- abs(cor(data)) # get absolute values of correlations data.col <- dmat.color(data.r) # get colors cpairs(data, panel.colors = data.col, gap = .5, main = "Variables Ordered and Colored by Correlation")
We make some 3d plots! One is interactive:
p <- plot_ly(mtcars, x = ~mpg, y = ~wt, z = ~qsec) %>% add_markers() %>% layout(scene = list(xaxis = list(title = 'mpg'), yaxis = list(title = 'wt'), zaxis = list(title = 'qsec'))) p
And here we project on a plane
scatterplot3d(wt, disp, mpg, main = "3D Scatterplot")
and add some colour.
scatterplot3d(wt, disp, mpg, pch = 16, highlight.3d = TRUE, type = "h", main = "3D Scatterplot")
We now simulate some data.
x <- rnorm(1000) y <- rnorm(1000)
We make some bins: we count the points falling in each occupied cell.
bin <- hexbin(x, y, xbins = 50) summary(bin)
And plot it!
plot(bin, main = "Hexagonal Binning")
We plot the data using the bins.
plot(data$mpg, data$wt, xlab = "mpg", ylab = "wt", main = "Scatterplot")
bin <- hexbin(data$mpg, data$wt, xbins = 10, xlab = "mpg", ylab = "wt") plot(bin)
We are going to use the wine.txt
file.
We first collect the data.
wine <- moxier::wine wine <- wine[, -1] colnames(wine) <- c( "ID", "Alcohol", "Malic acid", "Ash", "Alcalinity of ash", "Magnesium", "Total phenols", "Flavanoids", "Nonflavanoid phenols", "Proanthocyanins", "Color intensity", "Hue", "OD280/OD315", "Proline" ) wine
Describe the data and give suitable visual representation of the variables contained in the dataset. Select the relevant variables.
pairs(wine[2:14]) # Is this useful?
pairs(wine[2:6])
Compute the main location and dispersion indices.
apply(wine[2:14],2,mean)
var(wine)
cor(wine)
Select a couple of continuous and/or categorical variables. Analyse them separaetely, as in the univariate case we have gone through in the previous lessons.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.