Import documents from a directory into Mallet format

Share:

Description

This function takes a directory path as its only argument and returns a data.frame() with two columns: <id> & <text>, which can be passed to the mallet.import function. This data.frame() has as many rows as there are files in the Dir.

Usage

1

Arguments

Dir

The path to a directory containing one document per file.

Note

This function was contributed to RMallet by Dan Bowen.

See Also

mallet.import

Examples

1
2
3
4
5
6
## Not run: 
documents <- mallet.read.dir(Dir)
mallet.instances <- mallet.import(documents$id, documents$text, "en.txt",
		    		token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}")

## End(Not run)