bm25: Best Matching(BM25) - Deprecated

bm25R Documentation

Best Matching(BM25) - Deprecated

Description

Computer BM25 distance between sentences/documents.

Details

BM25 stands for Best Matching 25. It is widely using for ranking documents and a preferred method than TF*IDF scores. It is used to find the similar documents from a corpus, given a new document. It is popularly used in information retrieval systems. This implementation uses multiple cores for faster and parallel computation.

Public fields

corpus

a list containing sentences

use_parallel

enables parallel computation, defaults to FALSE

Methods

Public methods


Method new()

Usage
bm25$new(corpus, use_parallel)
Arguments
corpus

list, a list containing sentences

use_parallel

logical, enables parallel computation, defaults to FALSE. if TRUE uses n - 1 cores.

Details

Create a new 'bm25' object.

Returns

A 'bm25' object.

example <- c('white audi 2.5 car','black shoes from office', 'new mobile iphone 7','audi tyres audi a3', 'nice audi bmw toyota corolla') obj <- bm25$new(example, use_parallel=FALSE)


Method most_similar()

Usage
bm25$most_similar(document, topn = 1)
Arguments
document

character, for this value we find most similar sentences.

topn

integer, top n sentences to retrieve

Details

Returns a list of the most similar sentence

Returns

a vector of most similar documents

example <- c('white audi 2.5 car','black shoes from office', 'new mobile iphone 7','audi tyres audi a3', 'nice audi bmw toyota corolla') get_bm <- bm25$new(example, use_parallel=FALSE) input_document <- c('white toyota corolla') get_bm$most_similar(document = input_document, topn = 2)


Method clone()

The objects of this class are cloneable with this method.

Usage
bm25$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


superml documentation built on Nov. 14, 2022, 9:05 a.m.