Description Usage Arguments Examples
A function that cleans a corpus based on user specification. Handles each file in the ipath in parallel and runs clean_file on each file. Outputs the cleaned version of the file into the output directory specified. Make sure output directory either doesn't exist (yet) or has nothing important in it, As this function will delete whatever is already in there. Look at the documentation for clean_file to see the commands to pass to the cleaning script.
1 | clean_corpus(ipath, odir, ncores, clean_commands_str)
|
ipath |
A string specifying the path to all the text files to handle. |
odir |
A string specifying the path to an output directory. |
ncores |
A number specifying the number of cores to use. |
clean_commands_str |
A string containing the combined commands for the cleaning script. |
1 2 3 4 | ## Not run:
clean_corpus("/path/to/corpus/", "./cleaned/", 20, "-lnprsd --maintain-newlines --min-size 2")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.