wersimtext: Run Text models on corpora that were modified using the...

View source: R/wersimtext.R

wersimtextR Documentation

Run Text models on corpora that were modified using the WERSIM function

Description

This function calculates the word error rate between a hypothesis and a reference corpus.

Usage

wersimtext(x, measured_wer, new_wer, deletions_sim = 0.13,
  insertions_sim = 0.22, substitutions_sim = 0.65, num_sims,
  preprocessing = c("punctuation", "numbers"), mincount_wersim = 0,
  method, groupingvar_sim, direction = c(1, 2))

Arguments

x

A quanteda corpus to be modified

measured_wer

The word error rate (or an estimate thereof) of corpus x

new_wer

The word error rate for which the text model should be run. Generally, this is measured_wer plus some fixed incremental error such as 0.05

deletions_sim

The share of word error that should be introduced through deletions

insertions_sim

The share of word error that should be introduced through insertions

substitutions_sim

The share of word error that should be introduced through substitutions

num_sims

The number of simulations to be run

preprocessing

The preprocessing that should be done with the corpus. Defaults to c("punctuation","numbers") for excluding numbers and punctuation, can take on "min_term", stemming", "stopwords_en", "stopwords_de".

mincount_wersim

If "min_term" is part of "preprocessing", this parameters specifices the minimum number of times a word has to be in the corpus to be retained in the dfm

method

The text model that should be run on the simulated corpus. Can either be "sentiment" or "wordfish"

groupingvar_sim

The variable that groups the corpus

direction

For Wordfish: The parameters that are forwarded to the dir command in Wordfish (fixing the direction of the space). Defaults to c(1,2).

Value

A data frame with the grouping variable in column 1 and simulated quantities (sentiment or Wordfish estimates) in subsequent columns.

Examples

library(RecordLinkage)
library(quanteda)
library(wersim)
corp=corpus(data_corpus_dailnoconf1991)
dfm_corp=dfm(corp,groups="party")
wordfish_pos=austin::wordfish(quanteda::as.wfm(dfm_corp))
wordfish_pos_res=data.frame("theta"=wordfish_pos$theta,"party"=wordfish_pos$docs)
wersimulated_positions=wersimtext(corp,measured_wer=0,new_wer=0.05,deletions_sim=0.13,insertions_sim=0.22,substitutions_sim=0.65,num_sims=5,method="wordfish",groupingvar_sim="party")

jenswaeckerle/wersim documentation built on Dec. 7, 2022, 9:31 a.m.