Symmetrization: Calculating Symmetric Word Alignment
In word.alignment: Finding Word Alignment Using IBM model 1 for a Given Parallel Corpus and Its Evaluation

Description Usage Arguments Details Value Note Author(s) References See Also Examples

It calculates source-to-target and target-to-source alignments using IBM model 1, as well as symmetric word alignment models such as intersection, union or grow-diag.

Symmetrization(file_train1, file_train2, 
               nrec = -1, iter = 4, ul_s = FALSE, ul_t = TRUE, 
               intrnt = TRUE, method = c("union", "intersection", "grow-diag")) 
               


## S3 method for class 'symmet'
print(x, ...)

`file_train1`	the name of source language file in training set.
`file_train2`	the name of target language file in training set.
`nrec`	number of sentences to be read.If -1, it considers all sentences.
`iter`	number of iteration for IBM model 1.
`ul_s`	logical. If TRUE, it will convert the first character of source language's sentences. When source language is a right-to-left, it can be FALSE.
`ul_t`	logical. If TRUE, it will convert the first character of target language's sentences. When target language is a right-to-left, it can be FALSE.
`intrnt`	logical. TRUE means that one of the two languages is a right-to-left, so internet connection is necessary.
`method`	symmetric word alignment method (union, intersection or grow-diag alignment).
`x`	an object of class `"symmet"`.
`...`	further arguments passed to or from other methods.

Here, word alignment is not only a map of target language to source language and it is considered as a symmetric alignment such as union or intersection or grow-diag alignment.

Symmetrization returns an object of class "symmet".

An object of class "symmet" is a list containing the following components:

`time`	A number. (in second/minute/hour)
`method`	symmetric word alignment method (union, intersection or grow-diag alignment).
`alignment`	A list of alignment for each sentence pair.

Note that we have a memory restriction and just special computers with high cpu and big ram can allocate the vectors of this function. Of course, it depends on corpus size.

Neda Daneshgar and Majid Sarmad.

Koehn P. (2010), "Statistical Machine Translation.", Cambridge University, New York.

http://statmt.org/europarl/v7/bg-en.tgz

word_alignIBM1

#Since the extraction of  bg-en.tgz in Europarl corpus is time consuming, 
#so the aforementioned unzip files have been exported to http://www.um.ac.ir/~sarmad/... .

## Not run: 

S1 = Symmetrization ('http://www.um.ac.ir/~sarmad/word.a/euro.bg',
                    'http://www.um.ac.ir/~sarmad/word.a/euro.en',
                    nrec = 200, ul_s = TRUE, method = 'grow-diag'
                    intrnt = FALSE)

## End(Not run)