read.dic: Read/Write Dictionary Files

View source: R/read.dic.R

read.dicR Documentation

Read/Write Dictionary Files

Description

Read in or write dictionary files in Comma-Separated Values (.csv; weighted) or Linguistic Inquiry and Word Count (.dic; non-weighted) format.

Usage

read.dic(path, cats = NULL, type = "asis", as.weighted = FALSE,
  dir = getOption("lingmatch.dict.dir"), ..., term.name = "term",
  category.name = "category", raw = FALSE)

write.dic(dict, filename = NULL, type = "asis", as.weighted = FALSE,
  save = TRUE)

Arguments

path

Path to a file, a name corresponding to a file in getOption('lingmatch.dict.dir') (or '~/Dictionaries') or one of the dictionaries available at osf.io/y6g5b, a matrix-like object to be categorized, or a list to be formatted.

cats

A character vector of category names to be returned. All categories are returned by default.

type

A character indicating whether and how terms should be altered. Unspecified or matching 'asis' leaves terms as they are. Other options change wildcards to regular expressions: 'pattern' ('^[poi]') replaces initial asterisks with '\\b\\w*', and terminal asterisks with '\\w*\\b', to match terms within raw text; for anything else, terms are padded with ^ and $, then those bounding marks are removed when an asterisk is present, to match tokenized terms.

as.weighted

Logical; if TRUE, prevents weighted dictionaries from being converted to unweighted versions, or converts unweighted dictionaries to a binary weighted version – a data.frame with a "term" column of unique terms, and a column for each category.

dir

Path to a folder containing dictionaries, or where you would like dictionaries to be downloaded; passed to select.dict and/or download.dict.

...

Passes arguments to readLines.

term.name, category.name

Strings identifying column names in path containing terms and categories respectively.

raw

Logical or a character. As logical, indicates if path should be treated as a raw dictionary (as might be read in from a .dic file). As a character, replaces path as if it were read in from a file.

dict

A list with a named entry of terms for each category, or a data.frame with terms in one column, and categories or weights in the rest.

filename

The name of the file to be saved.

save

Logical: if FALSE, does not write a file.

Value

read.dic: A list (unweighted) with an entry for each category containing character vectors of terms, or a data.frame (weighted) with columns for terms (first, "term") and weights (all subsequent, with category labels as names).

write.dic: A version of the written dictionary – a raw character vector for unweighted dictionaries, or a data.frame for weighted dictionaries.

See Also

Other Dictionary functions: download.dict(), lma_patcat(), lma_termcat(), select.dict()

Examples

# make a small murder related dictionary
dict <- list(
  kill = c("kill*", "murd*", "wound*", "die*"),
  death = c("death*", "dying", "die*", "kill*")
)

# convert it to a weighted format
(dict_weighted <- read.dic(dict, as.weighted = TRUE))

# categorize it back
read.dic(dict_weighted)

# convert it to a string without writing to a file
cat(raw_dict <- write.dic(dict, save = FALSE))

# parse it back in
read.dic(raw = raw_dict)

## Not run: 

# save it as a .dic file
write.dic(dict, "murder")

# read it back in as a list
read.dic("murder.dic")

# read in the Moral Foundations or LUSI dictionaries from urls
moral_dict <- read.dic("https://osf.io/download/whjt2")
lusi_dict <- read.dic("https://www.depts.ttu.edu/psy/lusi/files/lusi_dict.txt")

# save and read in a version of the General Inquirer dictionary
inquirer <- read.dic("inquirer", dir = "~/Dictionaries")

## End(Not run)

miserman/lingmatch documentation built on Jan. 19, 2024, 4:44 p.m.