consExcel: Create an Excel File from Two Sentences

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/consExcel.R

Description

It creates an excel file from two sentences of two languages to help the user for constructing a gold standard such that he/she can set 1 or 2 for sure or possible alignments.

Usage

1
2
3
4
consExcel(tst.set_sorc, tst.set_trgt, 
          method = c ("gold", "aligns"), 
          out1 = "gold.xlsx", out2 = "align.xlsx", 
          nrec = -1, minlen = 5, maxlen = 40, ul_s = FALSE, ul_t = TRUE)

Arguments

tst.set_sorc

the name of source language file in test set.

tst.set_trgt

the name of target language file in test set.

method

it consists of two arguments. If If "gold", it creates a separated excel file of test set to fill up its sheets with 1|2 for sure|possible alignment. If "aligns", it creates a separated excel file of test set to fill up its sheets with '3' as an alignment.

out1

the name of the excel file for gold standard.

out2

the name of the excel file for alignment.

nrec

number of sentences to be read. If -1, it considers all sentences.

minlen

a minimum length of sentences.

maxlen

a maximum length of sentences.

ul_s

logical. If TRUE, it will convert the first character of source language's sentences. When source language is a right-to-left, it can be FALSE.

ul_t

logical. If TRUE, it will convert the first character of target language's sentences. When target language is a right-to-left, it can be FALSE.

Details

The first step for evaluation of word alignment quality is creating a gold standard. This function makes an excel file with nrec sheets of a test set consists of source and target languages. Each sheet consists of the words of the target sentence as its rows and the words of the source sentence as its columns. To create a gold standard, it can be filled by Sure/Possible alignments (Sure = 1, Possible = 2).

Sometimes, the user calculates word alignments using another software or even another method and he/she wants to evaluate such alignment with this package. So, this function can help him/her in this way, it creates a separated excel file in "out2.xlsx" (as a default: "align.xlsx") and it can be filled by number 3 for alignments.

Value

One or two excel file in "out1" or "out2" file.

Note

If you have not the non-ascii problem, you can use fix.gold function instead.

Author(s)

Neda Daneshgar and Majid Sarmad.

References

Holmqvist M., Ahrenberg L. (2011), "A Gold Standard for English-Swedish Word Alignment.", NODALIDA 2011 Conference Proceedings, 106 - 113.

Och F., Ney H.(2003), "A Systematic Comparison Of Various Statistical Alignment Models.", 2003 Association for Computational Linguistics, J03-1002, 29(1).

See Also

fix.gold

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## Not run: 

consExcel("http://www.um.ac.ir/~sarmad/word.a/source1.txt",
          "http://www.um.ac.ir/~sarmad/word.a/target1.txt",
           nrec = 5)

consExcel("http://www.um.ac.ir/~sarmad/word.a/source1.txt",
          "http://www.um.ac.ir/~sarmad/word.a/target1.txt", 
           nrec = 5, method = "aligns")

## End(Not run)

word.alignment documentation built on May 2, 2019, 4:58 p.m.