fromLATtoCYR: Latin to Cyrillic conversion function

Description Usage Arguments Value Examples

View source: R/fromLATtoCYR.R

Description

This function helps to convert transliterated Cyrillic to original Cyrillic.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
fromLATtoCYR(
  mdat = NULL,
  tolanguage = "Russian",
  LAOR = TRUE,
  OROR = FALSE,
  EnglishDetection = TRUE,
  EnglishLength = NULL,
  RussianCorrection = FALSE,
  SensitivityThreshold = 0.1
)

Arguments

mdat

character vector to be back-transliterated to Cyrillic.

tolanguage

language the text needs to be converted to ("Russian" by default)

LAOR

rules of tranliteration from transliterated Cyrillic to original Cyrillic (the rules are listed in the file "transliterationLAOR.csv").

OROR

rules to correct transliterated original Cyrillic (the rules are listed in the file "transliterationOROR.csv").

EnglishDetection

if set to TRUE, the script avoids transliteration of words found in the English vocabulary (file: english.txt). If set to FALSE, only user defined stop words are used (file: stopwordsfile.csv).

EnglishLength

threshold is set to ignore EnglishDectection words below given threshold.

RussianCorrection

if set to TRUE, the script attempts to match every back-transliterated word with the Russian vocabulary (files: russian.txt and russian_surnames.txt).

SensitivityThreshold

is used only if RussianCorrection==TRUE. It determines algorithm's sensitivity to mismatches (numbers closer to 0 define higher sensitivity to mismatches). SensitivityThreshold is set to 0.1 by default.

Value

Returns the vector of transliterated characters in Cyrillic.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
library(HooverArchives)

# conversion to Russian
dat<-c("Mezhdunarodnaia gazeta. Gl. redaktor: Iu. Zarechkin. Moscow, Russia. Semiweekly. 199?",
"DEN' UCHITELIA komissiia po obrazovaniiu ob''edineniia Iabloko",
"III-ii RIM vestnik Rossiiskogo patrioticheskogo dvizheniia. Redaktory: M. Artem'ev, V. Rugich. Moscow, Russia.")

converteddata_ru <- fromLATtoCYR(dat, LAOR=TRUE, OROR=FALSE, EnglishDetection=TRUE)


# conversion to Ukrainian
dat<-read.csv(system.file("Ukraine_microform.csv", package="HooverArchives"),
                     sep=",", encoding = "UTF-8", stringsAsFactors = FALSE)

converteddata_uk <- fromLATtoCYR(dat$FIELD.245, tolanguage="Ukrainian")

kkalininMI/HooverArchives documentation built on Oct. 28, 2020, 10:16 a.m.