create_cooccurrence_matrix: Create co-occurrence matrix from a list of transcripts

Description Usage Arguments Details Functions

View source: R/create_cooccurrence_matrix.R

Description

If the input is a character vector it will be treated as a single transcript. If a list of character vectors are provided, they they are handled as separate transcripts. Co-occurrences are not tracked across transcripts.

Usage

1
2
3
4
5
create_cooccurrence_matrix(tokens, window_size, types = NULL)

tabulate_cooccurrence_among_types(tokens, types, window_size)

get_forward_windows(tokens, type, window_size)

Arguments

tokens

A character vector or list of character vectors

window_size

The size of the forward-looking window within which co-occurrence should be tabulated.

types

An optional argument that defines the rows and columns of the returned coocurrence matrix.

Details

In a forward-looking window of size k, the first word in the window is associated with the remaining k - 1 words in the window. If the window is of size 2 and consists of cow, duck, then the counter tracking the number of times cow is following by duck will be incremented by one. In the returned co-occurrence matrix, this means incrementing the value in the row for "cow" and the column for "duck".

If types are provided, then only co-occurrences between the provided types are counted.

If the tokens input is a list of character vectors, forward-looking windows do NOT span list elements. Thus, if the input were:

list(c('cow', 'duck'), c('pig', 'chicken'))

Trying to construct a forward-looking window of size 2 beginning on the second token in the first character vector ("duck") would yield c('duck', NA). Co-occurrences are not tracked across list elements—they are not considered to be adjacent to each other in speech/text.

Functions


crcox/netbuildr documentation built on Dec. 19, 2021, 6:19 p.m.