COMURE: The Use of Linguistic Variants in Translations vs....
In corregp: Functions and Methods for Correspondence Regression

Description Format Source Examples

This data set was a case study in the COMURE project ("corpus-based, multivariate research of register variation in translated and non-translated Belgian Dutch") which was conducted at the Department of Translation, Interpreting and Communication of Ghent University between 2010 and 2014.

A data frame with 3762 rows and 5 variables.

Variant The linguistic variant used in a set of alternatives (27 levels).
Variable The linguistic variable specifying a set of alternatives (13 levels).
Variety The dichotomization of Variant into standard and non-standard.
Register The register or "Text type" of the data (6 levels).
Language The language (and source language) of the data (3 levels).

Delaere, I., G. De Sutter and K. Plevoets (2012) Is translated language more standardized than non-translated language? Target 24 (2), 203–224.

data(COMURE)
# The execution of corregp may be slow, due to bootstrapping:  
comure.crg <- corregp(Variant ~ Register * Language, data = COMURE, part = "Variable", b = 3000)
comure.crg
summary(comure.crg, parm = "b", add_ci = TRUE)
screeplot(comure.crg, add_ci = TRUE)
anova(comure.crg, nf = 2)
comure.col <- ifelse( xtabs(~ Variant + Variety, data = COMURE)[, "Standard"] > 0, "blue", "red")
plot(comure.crg, x_ell = TRUE, xsub = c("Register", "Language"), col_btm = comure.col, 
  col_top = "black")