Instructions

  1. Rename this document with your student ID (not the 10-digit number, your IU username, e.g. dajmcdon). Include your buddy in the author field if you are working together.
  2. Follow the instructions in each section.

Exercise 4 (ISL 4)

When the number of features $p$ is large, there tends to be a deterioration in the performance of KNN and other local approaches that perform prediction using only observations that are near the test observation for which a prediction must be made. This phenomenon is known as the curse of dimensionality, and it ties into the fact that non-parametric approaches often perform poorly when $p$ is large. We will now investigate this curse.

  1. Suppose that we have a set of observations, each with measurements on $p = 1$ feature, X. We assume that X is uniformly distributed on $[0,1]$. Associated with each observation is a response value. Suppose that we wish to predict a test observation’s response using only observations that are within 10 % of the range of X closest to that test observation. For instance, in order to predict the response for a test observation with X = 0.6, we will use observations in the range [0.55,0.65]. On average, what fraction of the available observations will we use to make the prediction?

Solution:

  1. Now suppose that we have a set of observations, each with measurements on $p = 2$ features, X1 and X2. We assume that (X1,X2) are uniformly distributed on $[0,1]\times[0,1]$. We wish to predict a test observation’s response using only observations that are within 10 % of the range of X1 and within 10 % of the range of X2 closest to that test observation. For instance, in order to predict the response for a test observation with X1 = 0.6 and X2 = 0.35, we will use observations in the range [0.55, 0.65] for X1 and in the range [0.3, 0.4] for X2. On average, what fraction of the available observations will we use to make the prediction?

Solution:

  1. Now suppose that we have a set of observations on $p = 100$ features. Again the observations are uniformly distributed on each feature, and again each feature ranges in value from 0 to 1. We wish to predict a test observation’s response using observations within the 10 % of each feature’s range that is closest to that test observation. What fraction of the available observations will we use to make the prediction?

Solution:

  1. Using your answers to parts (1)–(3), argue that a drawback of KNN when $p$ is large is that there are very few training observations "near" any given test observation.

Solution:

  1. Now suppose that we wish to make a prediction for a test observation by creating a $p$-dimensional hypercube centered around the test observation that contains, on average, 10 % of the training observations. For $p = 1,2,$ and 100, what is the length of each side of the hypercube? Comment on your answer. Note: A hypercube is a generalization of a cube to an arbitrary number of dimensions. When $p = 1$, a hypercube is simply a line segment, when $p = 2$ it is a square, and when $p = 100$ it is a 100-dimensional cube.

Solution:

Data analysis

data("iris")
iris$Species = as.factor(iris$Species == 'virginica')
levels(iris$Species) = c('not virginica','virginica')
library(GGally)
library(tidyverse)
library(MASS)
library(class)
ggpairs(iris, aes(color=Species), columns=1:4)

Estimate logistic regression and LDA using the iris data. Does logistic regression throw a warning? Why?

logit_iris = 
lda_iris = 

Estimate knn using a range of k. Choose the best k using CV as in the lecture.


Which method has the lowest classification error?



dajmcdon/ubc-stat406-labs documentation built on Aug. 18, 2020, 1:23 p.m.