processXMLwf: Function to process XML data from a specific corpus into...

Description Usage Arguments

Description

This function takes a .xml documents from a corpus of forum posts and returns the unique tokens and their frequencies. Can perhaps be used for other forum corpora which have a similar structure

Usage

1
processXMLwf(pathToFolder, minMaxWordCount = 300)

Arguments

pathToFolder,

the path to the folder containing the corpus

minMaxWordCount,

no documents with less tokens than indicated will be accepted and all documents longer than the spefified count will be cropped, defaults to 300. If it is set to NULL documents will not be cut and will only be excluded if they contain no usable characters


mouse0/suicideProject documentation built on May 3, 2019, 5:19 p.m.