knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Vignettes are long form documentation commonly included in packages. Because they are part of the distribution of the package, they need to be as compact as possible. The html_vignette
output type provides a custom style sheet (and tweaks some options) to ensure that the resulting html is as small as possible. The html_vignette
format:
The goal of KMediansR is to group a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). In k-medians clustering, we partition n
observations into k
clusters. It calculates the median for each cluster to determine its centroid. The kmedians
package performs k-medians clustering on the dataset entered by the users and returns clustered data. This can prove to be an extremely beneficial package as k-medians is more robust to outliers than the arithmetic mean(k-means).
The three main functions in the package are :
distance
function
mxn
array of m
original observations in an n
-dimensional spacepxk
array of p
original observations in an k
-dimensional space
It returns a mxp
distance matrix. For each i
and j
, the mteric distance(u=X[i], v=Y[j])
is computed and stored in the ij
th entrykmedian
function
summary
function
kmedians
function on the input data. It returns a dataframe that contains information about the model run such as the number of clusters, the number of points in each cluster, the inter and intra cluster distanceSimple example demonstrating the functionality of this package:
# load package library(KMediansR) # toy data with two clusters toy_data <- matrix( c(1,1,1,2,2,1,100,100,101,100,100,101), nrow = 6, ncol = 2, byrow = TRUE) # initialize the cluster centers m <- matrix( c(1,1,100,100), nrow = 2, ncol = 2, byrow = TRUE) # calculate Manhanttan distance between the medians and data points manhanttan_distance <- distance(X = toy_data, medians = m) [,1] [,2] [1,] 0 198 [2,] 1 197 [3,] 1 197 [4,] 198 0 [5,] 199 1 [6,] 199 1 # cluster the data points clustered <- kmedians(X = toy_data, num_clusters = 2) [[1]] [,1] [,2] [1,] 1 1 [2,] 100 100 [[2]] [1] 1 1 1 2 2 2 # generate summary results report <- summary(X = toy_data, medians = clustered[[1]], labels = clustered[[2]]) Cluster.Label Median.Coordinates Number.of.Points.in.Cluster Average.Distance Minimum.Distance Maximum.Distance 1 1 1,1 3 0.6666667 0 1 2 2 100,100 3 0.6666667 0 1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.