This function takes a .xml documents from a corpus of forum posts and returns the unique tokens and their frequencies in one data frame. Can perhaps be used for other forum corpora which have a similar structure
1 | XMLgetOneDataFrame(pathToFolder, minMaxWordCount = 300)
|
pathToFolder, |
the path to the folder containing the corpus |
minMaxWordCount, |
no documents with less tokens than indicated will be accepted and all documents longer than the spefified count will be cropped, defaults to 300. If it is set to NULL documents will not be cut and will only be excluded if they contain no usable characters |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.