wersimtext | R Documentation |
This function calculates the word error rate between a hypothesis and a reference corpus.
wersimtext(x, measured_wer, new_wer, deletions_sim = 0.13, insertions_sim = 0.22, substitutions_sim = 0.65, num_sims, preprocessing = c("punctuation", "numbers"), mincount_wersim = 0, method, groupingvar_sim, direction = c(1, 2))
x |
A quanteda corpus to be modified |
measured_wer |
The word error rate (or an estimate thereof) of corpus x |
new_wer |
The word error rate for which the text model should be run. Generally, this is measured_wer plus some fixed incremental error such as 0.05 |
deletions_sim |
The share of word error that should be introduced through deletions |
insertions_sim |
The share of word error that should be introduced through insertions |
substitutions_sim |
The share of word error that should be introduced through substitutions |
num_sims |
The number of simulations to be run |
preprocessing |
The preprocessing that should be done with the corpus. Defaults to c("punctuation","numbers") for excluding numbers and punctuation, can take on "min_term", stemming", "stopwords_en", "stopwords_de". |
mincount_wersim |
If "min_term" is part of "preprocessing", this parameters specifices the minimum number of times a word has to be in the corpus to be retained in the dfm |
method |
The text model that should be run on the simulated corpus. Can either be "sentiment" or "wordfish" |
groupingvar_sim |
The variable that groups the corpus |
direction |
For Wordfish: The parameters that are forwarded to the dir command in Wordfish (fixing the direction of the space). Defaults to c(1,2). |
A data frame with the grouping variable in column 1 and simulated quantities (sentiment or Wordfish estimates) in subsequent columns.
library(RecordLinkage) library(quanteda) library(wersim) corp=corpus(data_corpus_dailnoconf1991) dfm_corp=dfm(corp,groups="party") wordfish_pos=austin::wordfish(quanteda::as.wfm(dfm_corp)) wordfish_pos_res=data.frame("theta"=wordfish_pos$theta,"party"=wordfish_pos$docs) wersimulated_positions=wersimtext(corp,measured_wer=0,new_wer=0.05,deletions_sim=0.13,insertions_sim=0.22,substitutions_sim=0.65,num_sims=5,method="wordfish",groupingvar_sim="party")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.