library(knitr) opts_chunk$set(message=FALSE, warning=FALSE, fig.align='center',fig.width=10, fig.height=6, cache=TRUE, autodep = TRUE)
dajmcdon
). Include your buddy in the author field if you are working together.The leaf
dataset contains data derived from images of 40 different species of leaves. Examine the file LeafDescription.pdf
to see some example images and detailed descriptions of the different covariates. Use this description and the data to perform the analysis. Note: the included .csv
file only has species 1-36.
names(leaf) = c('Species','Specimen_Number','Eccentricity','Aspect_Ratio', 'Elongation','Solidity','Stochastic_Convexity','Isoperimetric_Factor', 'Maximal_Indent_Depth','Lobedness','Average_Intensity', 'Average_Contrast','Smoothness','Third_moment','Uniformity','Entropy')
lobes
which collapses the different leaf species into two groups: those in Species 5-7, 11, 15, 23, and 30 (many), versus the rest (one). lobes
.Species
or Specimen_number
obvs). Prune your tree using cross validation to choose the depth and plot the tree (see Slide 13 from this week). Produce a confusion matrix and find the tree's in-sample error rate.library(tidyverse) library(GGally) leaf = read.csv('leaf.csv', header = FALSE) names(leaf) = c('Species','Specimen_Number','Eccentricity','Aspect_Ratio', 'Elongation','Solidity','Stochastic_Convexity','Isoperimetric_Factor', 'Maximal_Indent_Depth','Lobedness','Average_Intensity', 'Average_Contrast','Smoothness','Third_moment','Uniformity','Entropy') leaf = leaf %>% mutate(lobes = factor(Species %in% c(5:7, 11, 15, 23, 30), labels = c('one','many'))) ggpairs(leaf, aes(color=lobes), columns = 3:16)
library(tree) library(maptree) my_tree = tree(lobes ~. -Species-Specimen_Number, data = leaf) tree_cv = cv.tree(my_tree, K=5) pruned_tree = prune.tree(my_tree, k=tree_cv$k[which.min(tree_cv$dev)]) draw.tree(pruned_tree) (tree_conf <- table(predict(pruned_tree, type='class'), leaf$lobes)) 1 - sum(diag(tree_conf))/sum(tree_conf)
library(randomForest) my_forest = randomForest(lobes ~. -Species-Specimen_Number, data = leaf, ntree=400) varImpPlot(my_forest) (forest_conf <- table(predict(my_forest), leaf$lobes)) 1 - sum(diag(forest_conf))/sum(forest_conf)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.