R API to Java Data Mining Framework ELKI (https://elki-project.github.io/)
The goal of this project is to offer an R interface to the Java data mining framework ELKI. ELKI provides a large collection of highly parameterizable algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection. For more information on ELKI, see here. This repository contains the ELKI jar version 0.75. If using this repository for research, please cite ELKI as explained here to give credit.
Currently, this interface implements twelve algorithms from the area of outlier detection. In case there's interest in using more of ELKI's algorithms from R, feel free to reach out!
The easiest way to use rElki in your R project is via devtools. To install devtools from CRAN, run install.packages("devtools")
in your R environment.
You can now use devtools' install_github
function to install the rElki
repository in your environment. Run devtools::install_github("lenaWitterauf/rElki")
to get the latest master version from this repository. In order to run rElki, you'll also need to install the R library rJava as well as a current version of Java on your computer, and then you're ready to go!
For this example let's assume we have a CSV file called some_data.csv
containing data points we want to calculate outliers for.
1. Load rElki - this will initialize the JVM
R
library(rElki)
2. Read the data from some_data.csv
into a dataframe
R
my_df <- read.csv("some_data.csv")
3. Run an outlier detection from rElki. For this example, let's use Fast ABOD with a neighbourhood size of 3
R
my_outlier_scores <- rElki::fast_abod(my_df, 3)
4. Print each observation in the dataset along with its outlier score
R
for(index in c(1:nrow(my_df))) {
print(paste('Observation:', paste(my_df[index,], collapse=',')))
print(paste('Score:', my_outlier_scores[index]))
}
R
library(rElki)
R
my_df <- replicate(5, rnorm(20))
2
R
my_outlier_scores <- rElki::odin(my_df, 2)
R
for(index in c(1:nrow(my_df))) {
print(paste('Observation:', paste(my_df[index,], collapse=',')))
print(paste('Score:', my_outlier_scores[index]))
}
This project, as the ELKI source code, is licenced under GNU AGPL 3. The ELKI jar in this repository includes parts that are licensed using different terms, such as the Apache License, BSD licenses or also public domain. It remains your responsibility to verify the license status when using or redistributing any of the files included. Using any of these files is WITHOUT ANY WARRANTY.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.