lemma_corpus: Clean Corpus

Description Usage Arguments Examples

Description

A function that lemmatizes a corpus using tree-tagger. Handles each file in the ipath in parallel and runs lemma_file on each file. Outputs to the output directory specified. Make sure the output directory doesn't exist already or has nothing in there.

Usage

1
2
3
lemma_corpus(ipath, odir, ncores,
  cmd = "/opt/tree-tagger/bin/tree-tagger",
  param = "/opt/tree-tagger/lib/english.par")

Arguments

ipath

A string specifying the path to the input directory with all the text files to lemma.

odir

A string specifying the output directory path.

ncores

A number specifying the number of cores ot use.

cmd

**optional** path to the tree taggery binary on your system.

param

**optional** path to the param file to use.

Examples

1
2
3
4
5
## Not run: 
lemma_corpus("/path/to/corpus/", "./lemmad/", 20)
lemma_corpus("corpus/", "lemmad/", 20, cmd="~/tt/bin/tree-tagger", param="~/tt/lib/english.par")

## End(Not run)

avkoehl/textprocessingDSI documentation built on June 5, 2019, 7:41 p.m.