bm25: Best Matching(BM25)

Description Usage Format Usage Methods Arguments Examples

Description

BM25 stands for Best Matching 25. It is widely using for ranking documents and a preferred method than TF*IDF scores. It is used to find the similar documents from a corpus, given a new document. It is popularly used in information retrieval systems. This implementation uses multiple cores for faster and parallel computation.

Usage

1

Format

R6Class object.

Usage

For usage details see Methods, Arguments and Examples sections.

1
2
3
bm25 = bm25$new(corpus, n_cores)
bm25$most_similar(input_document, topn)
bm25$compute(input_document)

Methods

$new()

Initialise the instance of the class. Here you pass the complete corpus of the documents

$most_similar()

it returns the topn most similar documents from the corpus

$compute()

it returns a similarity score for all the documents in the corpus, given a sentence

Arguments

corpus

a list containing sentences

use_parallel

boolean value used to activate parallel computation, defaults to FALSE

Examples

1
2
3
4
5
6
example <- c('white audi 2.5 car','black shoes from office',
             'new mobile iphone 7','audi tyres audi a3',
             'nice audi bmw toyota corolla')
get_bm <- bm25$new(example, use_parallel=FALSE)
input_document <- c('white toyota corolla')
get_bm$most_similar(document = input_document, topn = 2)

ssi-ashraf/superml documentation built on Nov. 5, 2019, 9:18 a.m.