getTypeTokenRatios: Function to get the type token ratios for a corpus

Description Usage Arguments

Description

This function cuts documents from a corpus to a fixed word count and ignores the documents which are shorter than that word count. The accepted documents are then tokenized and the type/token ratios for each are calculated and returned.

This function takes a data structure created by this package using methods such as getFromFolderWF and returns the type token ratios.

Usage

1
2
3
getTypeTokenRatios(wordFrequencyMatrix)

getTypeTokenRatios(wordFrequencyMatrix)

Arguments

wordFrequencyMatrix,

a data strucutre generated by this package which contians the unique tokens and their counts

path,

the path to the folder containing the corpus

minMaxWordCount,

no documents with less tokens than indicated will be accepted and all documents longer than the spefified count will be cropped, defaults to 300.


mouse0/suicideProject documentation built on May 3, 2019, 5:19 p.m.