read.corp.celex: Import Celex data

Description Usage Arguments Value References See Also Examples

View source: R/read.corp.celex.R

Description

Read data from Celex[1] formatted corpora.

Usage

1
2
3
4
5
6
7
read.corp.celex(
  celex.path,
  running.words,
  fileEncoding = "ISO_8859-1",
  n = -1,
  caseSens = TRUE
)

Arguments

celex.path

A character string, path to a frequency file in Celex format to read.

running.words

An integer value, number of running words in the Celex data corpus to be read.

fileEncoding

A character string naming the encoding of the Celex files.

n

An integer value defining how many lines of data should be read if format="flatfile". Reads all at -1.

caseSens

Logical, if FALSE forces all frequency statistics to be calculated regardless of the tokens' case. Otherwise, if the imported database supports it, you will get different frequencies for the same tokens in different cases (e.\,g., "one" and "One").

Value

An object of class kRp.corp.freq.

References

[1] http://celex.mpi.nl

See Also

kRp.corp.freq

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## Not run: 
my.Celex.data <- read.corp.celex(
  file.path("~","mydata","Celex","GERMAN","GFW","GFW.CD"),
  running.words=5952000
)
freq.analysis(
  tokenized.obj,
  corp.freq=my.Celex.data
)

## End(Not run)

koRpus documentation built on May 18, 2021, 1:13 a.m.