README.md

BigKnn

Build Status codecov.io

BigKnn is part of HADES.

Introduction

An R package implementing a large scale k-nearest neighbor (KNN) classifier using the Lucene search engine.

Features

Examples

covariates <- data.frame(rowIds = c(1,1,1,2,2,3),
                         covariateIds = c(10,11,12,10,11,12),
                         covariateValues = c(1,1,1,1,1,1))

outcomes <- data.frame(rowIds = c(1,2,3),
                       y = c(1,0,0))

dataForPrediction <- Andromeda::andromeda(covariates = covariates, 
                                          outcomes = outcomes)

indexFolder <- "s:/temp/lucene"

buildKnn(outcomes = dataForPrediction$outcomes,
         covariates = dataForPrediction$covariates,
         indexFolder = indexFolder)

prediction <- predictKnn(outcomes = dataForPrediction$outcomes,
                         covariates = dataForPrediction$covariates,
                         indexFolder = indexFolder,
                         k = 10,
                         weighted = TRUE)

Technology

BigKnn is an R package using the Java based Lucene search engine. The data for the KNN is stored in a folder on the local file system.

System Requirements

Running the package requires R with the package rJava installed. Also requires Java 1.8 or higher.

Installation

  1. See the instructions here for configuring your R environment, including Java.

  2. Use the following commands in R to install the BigKnn package:

r install.packages("remotes") remotes::install_github("ohdsi/BigKnn")

User Documentation

Documentation can be found on the package website.

PDF versions of the documentation are also available: * Package manual: BigKnn manual

Support

Contributing

Read here how you can contribute to this package.

License

BigKnn is licensed under Apache License 2.0. Lucene fall under its own Apache License 2.0.

Development

BigKnn is being developed in R Studio and Eclipse

Development status

Stable.



OHDSI/BigKnn documentation built on March 21, 2023, 8:35 a.m.