Description Details Author(s) Examples
Tools for corpus preparation.
The ctk-package relies on some external tools, such as the TreeTagger. The package gets the information on the location of these tools from environment variables that can be set in the .Renviron file in the home directory of a user. The .Renviron file might might contain the following lines: PATH_SAXON='/opt/saxon/saxon9he.jar' PATH_TREETAGGER='/opt/treetagger'
The saxon XSLT parser ist available here: http://sourceforge.net/projects/saxon/files/ For the TreeTagger, see: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/
Andreas Blaette
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ## Not run:
taz <- new("pipe", projectDir = "/home/blaette/Data/pipeDirs/taz")
taz <- setPaths(taz)
filesCopied <- getFiles(
taz, sourceDir = "/home/blaette/Lab/rsync/taz/html_out", targetDir = "xml",
pattern = "xml", method = "list.files", recursive = TRUE, rectify = FALSE,
verbose = FALSE, progress = TRUE
)
tokenize(taz, sourceDir = "xml", targetDir = "tok", progress = TRUE, mc = 3)
treetagger(taz, sourceDir = "tok", "vrt", progress = TRUE, mc = 3)
fix(taz, sourceDir = "vrt", targetDir = "vrt2", mc = 8)
encode(taz, corpus = "taz2", sourceDir = "vrt5", sample = 500, embedding = "10", encoding = "utf8")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.