library(devtools) install_github("weininghu1012/allaboutknn")
KNN is a non-parametric method used for classification and regression. The main idea is that we classify similar objects according to euclidean distances. To visualize this, show_cluster() would give you an example to see how data within same category come together.
# You might need ggfortify to use the function library(ggfortify) library(allaboutknn) iris = read.csv(file = '/Users/weininghu/Documents/study/ubc2015w1/stat545/dataset/iris.csv',header = TRUE) show_cluster(iris)
Why we need to normalize data? Sometimes the data we have may have a really large range.Then in this case, the distance may be dominated by that specific feature. When you normalize, you actually adjust the range of all features, so that distances between variables with larger ranges will not be over-emphasised. There are many ways to scale your data, click here. A good way to check the data range is calling summary() in R.
summary(iris) # we apply normalize function normalized_iris = as.data.frame(lapply(iris[1:4], normalize)) summary(normalized_iris)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.