ctk-package: R-Package 'ctk' (Corpus Toolkit).

Description Details Author(s) Examples

Description

Tools for corpus preparation.

Details

The ctk-package relies on some external tools, such as the TreeTagger. The package gets the information on the location of these tools from environment variables that can be set in the .Renviron file in the home directory of a user. The .Renviron file might might contain the following lines: PATH_SAXON='/opt/saxon/saxon9he.jar' PATH_TREETAGGER='/opt/treetagger'

The saxon XSLT parser ist available here: http://sourceforge.net/projects/saxon/files/ For the TreeTagger, see: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/

Author(s)

Andreas Blaette

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## Not run: 
taz <- new("pipe", projectDir = "/home/blaette/Data/pipeDirs/taz")
taz <- setPaths(taz)
filesCopied <- getFiles(
  taz, sourceDir = "/home/blaette/Lab/rsync/taz/html_out", targetDir = "xml",
  pattern = "xml", method = "list.files", recursive = TRUE, rectify = FALSE,
 verbose = FALSE, progress = TRUE
 )
tokenize(taz, sourceDir = "xml", targetDir = "tok", progress = TRUE, mc = 3)
treetagger(taz, sourceDir = "tok", "vrt", progress = TRUE, mc = 3)
fix(taz, sourceDir = "vrt", targetDir = "vrt2", mc = 8)
encode(taz, corpus = "taz2", sourceDir = "vrt5", sample = 500, embedding = "10", encoding = "utf8")

## End(Not run)

PolMine/ctk documentation built on May 8, 2019, 3:20 a.m.