read_freqs: Read and write plain text frequency lists

View source: R/io.R

read_freqsR Documentation

Read and write plain text frequency lists

Description

Convenience wrapper to import typical output from CWB tools. The expected file format is tab delimited, only containing data, no quotes, missing values, comments or header.

Usage

read_freqs(
  file,
  header = FALSE,
  cols = list(0L, ""),
  sep = "\t",
  comment.char = "",
  na.strings = "",
  quote = "",
  allowEscapes = FALSE,
  nlines = sh_count_lines(file),
  ...
)

write_freqs(..., sep = "\t", quote = FALSE, na = "")

fread_freqs(
  ...,
  header = FALSE,
  sep = "\t",
  quote = "",
  na.strings = NULL,
  stringsAsFactors = FALSE
)

fwrite_freqs(..., sep = "\t", sep2 = " ", quote = FALSE)

Arguments

header

logical. whether or not the first line should be used as column names. Note: names in cols take precedence.

cols

list. a list with column types (e.g. 0, for numeric, "" for character), if named, the names will be used as column/vector names. By default, column 1 is expected to contain integer frequencies and column 2 strings with types

sep

character.

comment.char

character.

na.strings

character.

quote

character.

nlines

integer. number of lines in file, see nlines in scan()

...

further arguments to be passed to scan

.x

character. path to file or connection, see scan()

skip

integer. how many lines to skip, see scan()

Details

These are convenience wrappers around scan or fread with sane defaults for common frequency list formats. In read_freqs, wc -l is run if available to pass the line number to scan This reduces memory overhead substantially and can also be a bit faster.

Value

data.frame or data.table

See Also

scan(), fread()

Examples

## Not run: 
path <- "brown_word_per_id.txt"
out <- read_freqs(path, list(f = 0L, type = "", text_id = ""))

## End(Not run)

alex-raw/cwbwrapr documentation built on Oct. 23, 2022, 9:08 p.m.