Description Usage Arguments Details
Tokenize (XML) files with one standard tool (treetagger, stanfordNLP, openNLP).
| 1 2 3 4 5 | 
| .Object | a ctk object | 
| ... | further paramters | 
| lang | language of the files to be tagged | 
| with | either "stanfordNLP", "treetagger" or "openNLP" | 
One potential problem with the perl-tokenizer that comes with the treetagger
is that the output is not valid XML. It is necessary to fix the XML with a 
shell command such as for i in $(ls); do sed 's/\xC2\xA0/ /g' $i > ../tok2/$i; done.
The XML may still not be valid ("&" etc.), so fix method is still necessary.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.