Description Usage Arguments Details
Tokenize (XML) files with one standard tool (treetagger, stanfordNLP, openNLP).
1 2 3 4 5 |
.Object |
a ctk object |
... |
further paramters |
lang |
language of the files to be tagged |
with |
either "stanfordNLP", "treetagger" or "openNLP" |
One potential problem with the perl-tokenizer that comes with the treetagger
is that the output is not valid XML. It is necessary to fix the XML with a
shell command such as for i in $(ls); do sed 's/\xC2\xA0/ /g' $i > ../tok2/$i; done
.
The XML may still not be valid ("&" etc.), so fix method is still necessary.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.