Symmetrization: Calculating Symmetric Word Alignment

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

It calculates source-to-target and target-to-source alignments using IBM model 1, as well as symmetric word alignment models such as intersection, union or grow-diag.

Usage

1
2
3
4
5
6
7
8
Symmetrization(file_train1, file_train2, 
               nrec = -1, iter = 4, ul_s = FALSE, ul_t = TRUE, 
               intrnt = TRUE, method = c("union", "intersection", "grow-diag")) 
               


## S3 method for class 'symmet'
print(x, ...) 

Arguments

file_train1

the name of source language file in training set.

file_train2

the name of target language file in training set.

nrec

number of sentences to be read.If -1, it considers all sentences.

iter

number of iteration for IBM model 1.

ul_s

logical. If TRUE, it will convert the first character of source language's sentences. When source language is a right-to-left, it can be FALSE.

ul_t

logical. If TRUE, it will convert the first character of target language's sentences. When target language is a right-to-left, it can be FALSE.

intrnt

logical. TRUE means that one of the two languages is a right-to-left, so internet connection is necessary.

method

symmetric word alignment method (union, intersection or grow-diag alignment).

x

an object of class "symmet".

...

further arguments passed to or from other methods.

Details

Here, word alignment is not only a map of target language to source language and it is considered as a symmetric alignment such as union or intersection or grow-diag alignment.

Value

Symmetrization returns an object of class "symmet".

An object of class "symmet" is a list containing the following components:

time

A number. (in second/minute/hour)

method

symmetric word alignment method (union, intersection or grow-diag alignment).

alignment

A list of alignment for each sentence pair.

Note

Note that we have a memory restriction and just special computers with high cpu and big ram can allocate the vectors of this function. Of course, it depends on corpus size.

Author(s)

Neda Daneshgar and Majid Sarmad.

References

Koehn P. (2010), "Statistical Machine Translation.", Cambridge University, New York.

http://statmt.org/europarl/v7/bg-en.tgz

See Also

word_alignIBM1

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#Since the extraction of  bg-en.tgz in Europarl corpus is time consuming, 
#so the aforementioned unzip files have been exported to http://www.um.ac.ir/~sarmad/... .

## Not run: 

S1 = Symmetrization ('http://www.um.ac.ir/~sarmad/word.a/euro.bg',
                    'http://www.um.ac.ir/~sarmad/word.a/euro.en',
                    nrec = 200, ul_s = TRUE, method = 'grow-diag'
                    intrnt = FALSE)

## End(Not run)

word.alignment documentation built on May 2, 2019, 4:58 p.m.