Home

/

GitHub

/

MickyDowns/mine_algorithms

/

In MickyDowns/mine_algorithms: What the package does (short line)

KMeans

Clustering cluster.csv

Use kmeans() w/ default values to find the k=2 solution for the 2-dimensional data cluster.csv

setwd("./data")
data<-read.csv("cluster.csv",header=F)
setwd("../")

Plot initial data.

plot(data)

Cluster: kmeans() produces centers, cluster assignments, etc.

fit<-kmeans(data,2)
fit
fit$centers

Plot: Key here is to use kmeans() output rather than moving data around.

plot(data$V1,data$V2,col=fit$cluster)
points(fit$centers,col=c("black","red"),pch=19)

Clustering sonar

Use kmeans() w/ default values to find the k=2 solution for the 2-dimensional sonar data.

setwd("./data")
train<-read.csv("sonar_train.csv",header=F)
test<-read.csv("sonar_test.csv",header=F)
setwd("../")

Plot just the first two columns of the sonar data.

plot(train[,1:2])

Cluster: kmeans() can use as many attributes as you want. But, let's look at the clusters created by the first two.

fit<-kmeans(train[,1:2],2)
fit

Plot: Key here is to use kmeans() output rather than moving data around.

plot(train[,1:2],col=fit$cluster)
points(fit$centers,col="blue",pch=19)

Compare sonar clusters to actual class labels??

plot(train[,1:2],pch=19,xlab=expression(x[1]),
     ylab=expression(x[2]))
## get your y labels
y<-train[,61]
## re-plot points with color based on class labels.
points(train[,1:2],col=2+2*y,pch=19)

Compute the misclass error

What if we used kmeans() to classify. What would our misclass error be?

## transform cluster labels (1's and 2's) to -1s and 1s
sum(fit$cluster*2-3==y)/length(y)

Try it for all 60 columns

fit<-kmeans(train[,1:60],2)
sum(fit$cluster*2-3==y)/length(y)
sum(fit$cluster*2-3!=y)/length(y)

Try w/ more centroids. Disaster.

fit<-kmeans(train[,1:60],10)
sum(fit$cluster*2-3==y)/length(y)
sum(fit$cluster*2-3!=y)/length(y)

Gist: kmeans() is a good clustering tool. Not a good prediction tool.

What is kmeans doing?

First code it manually.

x<-c(1,2,3,5,6,7,8)
center1<-1
center2<-2

for (k in 2:10) {
     cluster1<-x[abs(x-center1[k-1])<=abs(x-center2[k-1])]
     ## Put in cluster1 all x's where distance to c1<= distance to c2.
     cluster2<-x[abs(x-center1[k-1])>abs(x-center2[k-1])]
     ## Put in c2 all x's where distance to c1>distance to c2

     center1[k]<-mean(cluster1)
     center2[k]<-mean(cluster2)
     ## apparently mean() will take the mean between of all values in a cluster.
     ## set k=2. Decrement it 1 to control iteration. Also use it to track the updates clusters. 
}

center1
center2
cluster1
cluster2

Compare to kmeans()

x<-c(1,2,3,5,6,7,8)

fit<-kmeans(x,2)

plot(x,col=fit$cluster)

Calc distances

x1<-c(2,2)
x2<-c(5,7)

data<-matrix(c(x1,x2),nrow=2,byrow=T)
data

dist(data)

MickyDowns/mine_algorithms documentation built on May 8, 2019, 10:49 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MickyDowns/mine_algorithms
What the package does (short line)

In MickyDowns/mine_algorithms: What the package does (short line)

KMeans

Clustering cluster.csv

Clustering sonar

Compare sonar clusters to actual class labels??

Compute the misclass error

Try it for all 60 columns

What is kmeans doing?

Calc distances

R Package Documentation

Browse R Packages

We want your feedback!

MickyDowns/mine_algorithms What the package does (short line)

In MickyDowns/mine_algorithms: What the package does (short line)

KMeans

Clustering cluster.csv

Clustering sonar

Compare sonar clusters to actual class labels??

Compute the misclass error

Try it for all 60 columns

What is kmeans doing?

Calc distances

R Package Documentation

Browse R Packages

We want your feedback!

MickyDowns/mine_algorithms
What the package does (short line)