dajmcdon
). Include your buddy in the author field if you are working together.The leaf
dataset contains data derived from images of 40 different species of leaves. Examine the file LeafDescription.pdf
to see some example images and detailed descriptions of the different covariates. Use this description and the data to perform the analysis.
names(leaf) = c('Species','Specimen_Number','Eccentricity','Aspect_Ratio', 'Elongation','Solidity','Stochastic_Convexity','Isoperimetric_Factor', 'Maximal_Indent_Depth','Lobedness','Average_Intensity', 'Average_Contrast','Smoothness','Third_moment','Uniformity','Entropy')
lobes
which collapses the different leaf species into two groups: those in Species 5-7, 11, 15, 23, and 30 (many), versus the rest (one). Produce a pairs plot of all continuous predictors. Color the points by lobes
.
Train a tree classifier based on this data for predicting complexity. Use all predictors (not Species
or Specimen_number
obvs). Prune your tree using cross validation to choose the depth and plot the tree (see Slide 13 from this week). Produce a confusion matrix and find the tree's in-sample error rate.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.