Description Usage Arguments Details Value Note Author(s) References See Also Examples
It calculates source-to-target and target-to-source alignments using IBM model 1, as well as symmetric word alignment models such as intersection, union or grow-diag.
1 2 3 4 5 6 7 8 |
file_train1 |
the name of source language file in training set. |
file_train2 |
the name of target language file in training set. |
nrec |
number of sentences to be read.If -1, it considers all sentences. |
iter |
number of iteration for IBM model 1. |
ul_s |
logical. If TRUE, it will convert the first character of source language's sentences. When source language is a right-to-left, it can be FALSE. |
ul_t |
logical. If TRUE, it will convert the first character of target language's sentences. When target language is a right-to-left, it can be FALSE. |
intrnt |
logical. TRUE means that one of the two languages is a right-to-left, so internet connection is necessary. |
method |
symmetric word alignment method (union, intersection or grow-diag alignment). |
x |
an object of class |
... |
further arguments passed to or from other methods. |
Here, word alignment is not only a map of target language to source language and it is considered as a symmetric alignment such as union or intersection or grow-diag alignment.
Symmetrization
returns an object of class "symmet"
.
An object of class "symmet"
is a list containing the following components:
time |
A number. (in second/minute/hour) |
method |
symmetric word alignment method (union, intersection or grow-diag alignment). |
alignment |
A list of alignment for each sentence pair. |
Note that we have a memory restriction and just special computers with high cpu and big ram can allocate the vectors of this function. Of course, it depends on corpus size.
Neda Daneshgar and Majid Sarmad.
Koehn P. (2010), "Statistical Machine Translation.", Cambridge University, New York.
http://statmt.org/europarl/v7/bg-en.tgz
word_alignIBM1
1 2 3 4 5 6 7 8 9 10 11 | #Since the extraction of bg-en.tgz in Europarl corpus is time consuming,
#so the aforementioned unzip files have been exported to http://www.um.ac.ir/~sarmad/... .
## Not run:
S1 = Symmetrization ('http://www.um.ac.ir/~sarmad/word.a/euro.bg',
'http://www.um.ac.ir/~sarmad/word.a/euro.en',
nrec = 200, ul_s = TRUE, method = 'grow-diag'
intrnt = FALSE)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.