wer: Calculating the Word Error Rate

werR Documentation

Calculating the Word Error Rate

Description

This function calculates the word error rate between a hypothesis and a reference corpus.

Usage

wer(r, h)

Arguments

r

The reference quanteda corpus

h

The hypothesis quanteda corpus

Value

Returns a dataframe containing the Word error rate, the number of substitutions, deletions and insertions and the number of words in the reference and hypothesis corpora for each text in the corpus.

Examples

hypothesis_data=data.frame(text="The meadoww very nice and the two sun shines bright",
name="doc1",stringsAsFactors = F)
hypothesis_corpus=quanteda::corpus(hypothesis_data,docid_field = "name", text_field = "text")
reference_data=data.frame(text="The meadow is very nice and the sun shines bright",
name="doc1",stringsAsFactors = F)
reference_corpus=quanteda::corpus(reference_data,docid_field = "name", text_field = "text")
wer(r=reference_corpus,h=hypothesis_corpus)
# One substitution ("meadoww" instead of "meadow"), one deletion ("is") and one insertion ("two")
# Overall, this means there are three mistakes for ten reference words, giving a Word error rate of 0.3

jenswaeckerle/wersim documentation built on Dec. 7, 2022, 9:31 a.m.