README.md
In lenaWitterauf/rElki: An R interface to Java package ELKI (Version 0.7.5)

rElki

R API to Java Data Mining Framework ELKI (https://elki-project.github.io/)

The goal of this project is to offer an R interface to the Java data mining framework ELKI. ELKI provides a large collection of highly parameterizable algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection. For more information on ELKI, see here. This repository contains the ELKI jar version 0.75. If using this repository for research, please cite ELKI as explained here to give credit.

Currently, this interface implements twelve algorithms from the area of outlier detection. In case there's interest in using more of ELKI's algorithms from R, feel free to reach out!

The easiest way to use rElki in your R project is via devtools. To install devtools from CRAN, run install.packages("devtools") in your R environment.

You can now use devtools' install_github function to install the rElki repository in your environment. Run devtools::install_github("lenaWitterauf/rElki") to get the latest master version from this repository. In order to run rElki, you'll also need to install the R library rJava as well as a current version of Java on your computer, and then you're ready to go!

For this example let's assume we have a CSV file called some_data.csv containing data points we want to calculate outliers for. 1. Load rElki - this will initialize the JVM R library(rElki) 2. Read the data from some_data.csv into a dataframe R my_df <- read.csv("some_data.csv") 3. Run an outlier detection from rElki. For this example, let's use Fast ABOD with a neighbourhood size of 3 R my_outlier_scores <- rElki::fast_abod(my_df, 3) 4. Print each observation in the dataset along with its outlier score R for(index in c(1:nrow(my_df))) { print(paste('Observation:', paste(my_df[index,], collapse=','))) print(paste('Score:', my_outlier_scores[index])) }

Load rElki - this will initialize the JVM R library(rElki)
Generate normally distributed data points R my_df <- replicate(5, rnorm(20))
Run an outlier detection from rElki. For this example, let's use ODIN with a neighbourhood size of 2 R my_outlier_scores <- rElki::odin(my_df, 2)
Print each observation in the dataset along with its outlier score R for(index in c(1:nrow(my_df))) { print(paste('Observation:', paste(my_df[index,], collapse=','))) print(paste('Score:', my_outlier_scores[index])) }

This project, as the ELKI source code, is licenced under GNU AGPL 3. The ELKI jar in this repository includes parts that are licensed using different terms, such as the Apache License, BSD licenses or also public domain. It remains your responsibility to verify the license status when using or redistributing any of the files included. Using any of these files is WITHOUT ANY WARRANTY.

lenaWitterauf/rElki documentation built on June 2, 2020, 9:24 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com