dajmcdon
). Include your buddy in the author field if you are working together.When the number of features $p$ is large, there tends to be a deterioration in the performance of KNN and other local approaches that perform prediction using only observations that are near the test observation for which a prediction must be made. This phenomenon is known as the curse of dimensionality, and it ties into the fact that non-parametric approaches often perform poorly when $p$ is large. We will now investigate this curse.
X
. We assume that X
is uniformly distributed on $[0,1]$. Associated with each observation is a response value. Suppose that we wish to predict a test observation’s response using only observations that are within 10 % of the range of X
closest to that test observation. For instance, in order to predict the response for a test observation with X = 0.6
, we will use observations in the range [0.55,0.65]
. On average, what fraction of the available observations will we use to make the prediction?Solution:
X1
and X2
. We assume that (X1,X2)
are uniformly distributed on $[0,1]\times[0,1]$. We wish to predict a test observation’s response using only observations that are within 10 % of the range of X1
and within 10 % of the range of X2
closest to that test observation. For instance, in order to predict the response for a test observation with X1 = 0.6
and X2 = 0.35
, we will use observations in the range [0.55, 0.65]
for X1
and in the range [0.3, 0.4]
for X2
. On average, what fraction of the available observations will we use to make the prediction?Solution:
Solution:
Solution:
Solution:
data("iris") iris$Species = as.factor(iris$Species == 'virginica') levels(iris$Species) = c('not virginica','virginica') library(GGally) library(tidyverse) library(MASS) library(class) ggpairs(iris, aes(color=Species), columns=1:4)
Estimate logistic regression and LDA using the iris
data. Does logistic regression throw a warning? Why?
logit_iris = lda_iris =
Estimate knn using a range of k. Choose the best k using CV as in the lecture.
Which method has the lowest classification error?
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.