get_dispersion: Dispersion of tokens in a text
In dhlabR: National Library of Norway Quantitative Text Data API Tools

get_dispersion

R Documentation

Dispersion of tokens in a text

Description

This function wraps a call to the dispersion service, which calculates the dispersion of a list of tokens throughout a text in the National Library of Norway's collection, given by the URN. The text is divided into chunks, and the count of tokens in each chunk is returned.

Usage

get_dispersion(urn = NULL, words = list(".", ","), window = 500, pr = 100)

Arguments

`urn`	A National Library of Norway URN to a text object.
`words`	A list or vector of words (tokens) to analyze for dispersion.
`window`	The size of the text chunk to count the tokens within.
`pr`	(Per) Determines the step size for moving forward to the next chunk. If 'pr' is equal to 'window', the text is divided into non-overlapping chunks of size 'window'. If 'pr' is smaller than 'window', the chunks will overlap, creating a smoother curve.

Value

A data frame with the count of tokens in each chunk.

Examples

urn <- "URN:NBN:no-nb_digibok_2013060406055"
words <- c("Dracula", "Mina", "Helsing")
window <- 1000
pr <- 1000
dispersion_result <- get_dispersion(urn, words, window, pr)

dhlabR documentation built on Sept. 11, 2024, 9:12 p.m.