bm25_score: Score a text corpus based on the Okapi BM25 algorithm
In rbm25: A Light Wrapper Around the 'BM25' 'Rust' Crate for Okapi BM25 Text Search

View source: R/bm25_score.R

bm25_score

R Documentation

Score a text corpus based on the Okapi BM25 algorithm

Description

A simple wrapper around the BM25 class.

Usage

bm25_score(data, query, lang = NULL, k1 = 1.2, b = 0.75)

Arguments

`data`	text data, a vector of strings. Note any preprocessing steps (tolower, removing stopwords etc) need to have taken place before this!
`query`	the term to search for, note all preprocessing that was applied to the text corpus initially needs to be already performed on the term, e.g., tolower, removing stopwords etc
`lang`	language of the data, see self$available_languages(), can also be "detect" to automatically detect the language, default is "detect"
`k1`	k1 parameter of BM25, default is 1.2
`b`	b parameter of BM25, default is 0.75

Value

a numeric vector of the BM25 scores, note higher values are showing a higher relevance of the text to the query

Examples

corpus <- c(
 "The rabbit munched the orange carrot.",
 "The snake hugged the green lizard.",
 "The hedgehog impaled the orange orange.",
 "The squirrel buried the brown nut."
)
scores <- bm25_score(data = corpus, query = "orange")
data.frame(text = corpus, scores_orange = scores)

rbm25 documentation built on April 12, 2025, 2:22 a.m.