Home

/

GitHub

/

MickyDowns/mine_algorithms

/

nearNeigh.md

nearNeigh.md
In MickyDowns/mine_algorithms: What the package does (short line)

KNN

Use knn() to fit the 1NN classifier to the last column of the sonar traiing data.

setwd("./data")
##download.file("nice redirects google","sonar_train.csv", method="curl")
##download.file("https://sites.google.com/site/stats202/data/sonar_train.csv",
##              "sonar_test.csv",method="curl")

train<-read.csv("sonar_train.csv",header=F)
test<-read.csv("sonar_test.csv",header=F)

setwd("../")

Load libraries.

library(class)

Using default values.

## set the y var as factor for class prediction.
y<-as.factor(train[,61])
## set the remaining colums to be test data.
x<-train[,1:60]
## train model to predict y using all x.
fit<-knn(train=x,test=x,cl=y,k=1)
## Note: both train and test parms use the same, in this case, train data set. This 
## gives the training misclass error.

Assess fit on training data. (Really silly b/c you're using the training data to predict itself. Further, w/ k=1 it's not doing any real fitting.)

fit

##   [1] -1 1  -1 1  1  1  -1 -1 1  -1 -1 1  1  -1 -1 1  -1 1  1  1  1  -1 -1
##  [24] 1  1  1  -1 1  1  1  -1 1  1  1  -1 1  1  -1 -1 1  -1 -1 1  1  -1 -1
##  [47] 1  -1 1  1  -1 -1 1  -1 1  -1 -1 1  -1 1  1  1  1  -1 -1 -1 -1 1  1 
##  [70] 1  1  -1 1  -1 1  1  -1 -1 -1 1  1  -1 -1 1  -1 1  -1 -1 -1 -1 1  1 
##  [93] -1 1  -1 1  -1 -1 1  -1 -1 -1 1  -1 -1 -1 -1 -1 1  1  -1 -1 1  -1 1 
## [116] 1  1  -1 -1 -1 -1 -1 -1 1  -1 1  1  1  1  -1
## Levels: -1 1

## fit give the predicted class labels of the training data.

1-sum(y==fit)/length(y)

## [1] 0

## sum(y==fit) gives the number of times your prediction (fit) matched your actual
## class label (y). So, this gives the overall accuracy (1-that pct) or the overall 
## error rate. Zero in this case.

Compute the misclass error on the training and test data.

y_test<-as.factor(test[,61])
x_test<-test[,1:60]

fit_test<-knn(x,x_test,y,k=1)

1-sum(y_test==fit_test)/length(y_test)

## [1] 0.2051

## how often does my predicted test lables (fit) = my true test labels (y_test)
## 0.21

So, when k=1, train error = 0, test error = 0.21... slightly better than decision trees. Not bad w/ 60 dimension data.

So, let's try using k=5.

fit_test<-knn(x,x_test,y,k=5)

1-sum(y_test==fit_test)/length(y_test)

## [1] 0.2308

## 0.23

Could be we're generalizing away from the signal. k=3.

fit_test<-knn(x,x_test,y,k=3)

1-sum(y_test==fit_test)/length(y_test)

## [1] 0.1795

## 0.18

k=3 looks like balance between over-fitting (k=1) and over-generalizing (k=5). Now would be a good time to bring in another test data set.

MickyDowns/mine_algorithms documentation built on May 8, 2019, 10:49 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MickyDowns/mine_algorithms
What the package does (short line)

nearNeigh.md
In MickyDowns/mine_algorithms: What the package does (short line)

KNN

In class exercise #34.

R Package Documentation

Browse R Packages

We want your feedback!

MickyDowns/mine_algorithms What the package does (short line)

nearNeigh.md In MickyDowns/mine_algorithms: What the package does (short line)

KNN

In class exercise #34.

R Package Documentation

Browse R Packages

We want your feedback!

MickyDowns/mine_algorithms
What the package does (short line)

nearNeigh.md
In MickyDowns/mine_algorithms: What the package does (short line)