In dajmcdon/ubc-stat406-labs: Tutorials and labs for UBC Stat 406 in the 2020-2021 online year

library(knitr)
opts_chunk$set(message=FALSE, warning=FALSE, fig.align='center',fig.width=10,
               fig.height=6, cache=TRUE, autodep = TRUE)

Instructions

Rename this document with your student ID (not the 10-digit number, your IU username, e.g. dajmcdon). Include your buddy in the author field if you are working together.
Follow the instructions in each section.

Trees and leaves

The leaf dataset contains data derived from images of 40 different species of leaves. Examine the file LeafDescription.pdf to see some example images and detailed descriptions of the different covariates. Use this description and the data to perform the analysis. Note: the included .csv file only has species 1-36.

Analysis

Load the data. Note: only the "simple" leaves are included. You may need to rename the columns with:

names(leaf) = c('Species','Specimen_Number','Eccentricity','Aspect_Ratio',
                'Elongation','Solidity','Stochastic_Convexity','Isoperimetric_Factor',
                'Maximal_Indent_Depth','Lobedness','Average_Intensity',
                'Average_Contrast','Smoothness','Third_moment','Uniformity','Entropy')

Create a new factor called lobes which collapses the different leaf species into two groups: those in Species 5-7, 11, 15, 23, and 30 (many), versus the rest (one).
Produce a pairs plot of all continuous predictors. Color the points by lobes.
Train a tree classifier based on this data for predicting complexity. Use all predictors (not Species or Specimen_number obvs). Prune your tree using cross validation to choose the depth and plot the tree (see Slide 13 from this week). Produce a confusion matrix and find the tree's in-sample error rate.
Train a random forest using 400 trees. Produce a variable importance plot, a confusion matrix, and find the in-sample error rate.

Load data and pairs plot

library(tidyverse)
library(GGally)
leaf = read.csv('leaf.csv', header = FALSE)
names(leaf) = c('Species','Specimen_Number','Eccentricity','Aspect_Ratio',
                'Elongation','Solidity','Stochastic_Convexity','Isoperimetric_Factor',
                'Maximal_Indent_Depth','Lobedness','Average_Intensity',
                'Average_Contrast','Smoothness','Third_moment','Uniformity','Entropy')
leaf = leaf %>% mutate(lobes = factor(Species %in% c(5:7, 11, 15, 23, 30), 
                                   labels = c('one','many')))
ggpairs(leaf, aes(color=lobes), columns = 3:16)

The tree

library(tree)
library(maptree)
my_tree = tree(lobes ~. -Species-Specimen_Number, data = leaf)
tree_cv = cv.tree(my_tree, K=5)
pruned_tree = prune.tree(my_tree, k=tree_cv$k[which.min(tree_cv$dev)])
draw.tree(pruned_tree)
(tree_conf <- table(predict(pruned_tree, type='class'), leaf$lobes))
1 - sum(diag(tree_conf))/sum(tree_conf)

The forest

library(randomForest)
my_forest = randomForest(lobes ~. -Species-Specimen_Number, data = leaf, ntree=400)
varImpPlot(my_forest)
(forest_conf <- table(predict(my_forest), leaf$lobes))
1 - sum(diag(forest_conf))/sum(forest_conf)

dajmcdon/ubc-stat406-labs documentation built on Aug. 18, 2020, 1:23 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

dajmcdon/ubc-stat406-labs
Tutorials and labs for UBC Stat 406 in the 2020-2021 online year

In dajmcdon/ubc-stat406-labs: Tutorials and labs for UBC Stat 406 in the 2020-2021 online year

Instructions

Trees and leaves

Analysis

Load data and pairs plot

The tree

The forest

R Package Documentation

Browse R Packages

We want your feedback!

dajmcdon/ubc-stat406-labs Tutorials and labs for UBC Stat 406 in the 2020-2021 online year

In dajmcdon/ubc-stat406-labs: Tutorials and labs for UBC Stat 406 in the 2020-2021 online year

Instructions

Trees and leaves

Analysis

Load data and pairs plot

The tree

The forest

R Package Documentation

Browse R Packages

We want your feedback!

dajmcdon/ubc-stat406-labs
Tutorials and labs for UBC Stat 406 in the 2020-2021 online year