cocaReadFreq: Read CoCA word frequency table.

Description Usage Arguments Details Value Author(s) See Also

Description

Use read.table to get COCA word frequency table.

Usage

1
2
cocaReadFreq(file, sep = "\t", na.strings = "  ", quote = "\"",
  header = TRUE, fill = TRUE, skip = 2, simpleWC = TRUE, ...)

Arguments

file

Sent to read.table.

sep

The CoCA lexical frequency file is tab delimited. Value sent to read.table.

na.strings

Sent to read.table.

quote

Some fields in CoCA file contain "'". So remove that character from the read.table default for this parameter. Sent to read.table.

header

The CoCA file includes a header. Value sent to read.table.

fill

Over-ride default value because the end of the header row in the CoCA frequency file has a stray tab, at least in my copy.

skip

Skip 2 comment rows at the top of the file.

simpleWC

If TRUE (the default) then add vector of simplified wordclasses to data.frame. See cocaSimpleWordClass.

...

additional arguments will be passed to read.table.

Details

Mostly a convenience wrapper around read.table with reasonable defaults for reading the Corpus of Contemporary American English word frequency file (corpus.byu.edu). The file contains tab delimited text, with some idiosynchracies.

Contents of data.frame as documented in CoCA itself.

The following information is adapted from the spreadsheet version of the lexical frequency table that is distributed with CoCA itself.

This spreadsheet contains the 100,000 word list (http://www.wordfrequency.info/100k.asp) that is based on the Corpus of Contemporary American English (COCA; http://corpus.byu.edu/coca/) and other corpora (http://corpus.byu.edu).

This copy of the data cannot be shared with others. Note also that a small change has been made to the data in this spreadsheet to indentify you as the source of the spreadsheet.

The file includes a great deal of data from several different corpora. Column contents are listed below, by column name.

Column

Value

a data.frame

Author(s)

Dave Braze davebraze@gmail.com

See Also

read.table


davebraze/FDB1 documentation built on May 14, 2019, 8:59 p.m.