buildDocList: Create a structured list of your collection's metadata
In michaelgavin/tei2r: Import and parse TEI files in R

Description Usage Arguments Value Description Examples

View source: R/buildDocList.R

Create a structured list of your collection's metadata

1 2	buildDocList(directory = "", stopwordsFile = "", indexFile = "", import = TRUE, normalize = TRUE)

`directory`	A string that is the path to the directory where the files that make up your corpus are located.
`stopwordsFile`	A string that is the path to the file that contains the words that are to be removed from the text in the `cleanup` function. If left blank, the default stopwords will be provided.
`indexFile`	A string that is the path to the index file for the collection. This function expects to find a .csv file that contains the metadata for your collection, including a column that points to the names of the files (with or without the file extensions). If you do not have an index file, this parameter can be left blank and the collection will take all files in the directory.
`import`	A logical vector. If TRUE, the texts will be imported to your `docList` and stored as a list of character vectors. This runs the `importTexts` function over the collection, which in turn performs a `cleanup` function that processed the texts for analysis.
`normalize`	A logical vector. If TRUE, the texts will be normalized upon import.

dl The completed docList object for use with the other functions of the tei2r package.

The docList object is the foundation of the tei2r package and contains references to all of the basic information that is required to begin working with texts in R. The buildDocList function is designed to construct the docList object with as much or as little information available. You can begin either with a collection of plain-text or TEI-encoded documents stored in a folder (a 'directory'), or you can begin by searching the EEBO-TCP collection and downloading their files using tcpSearch and tcpDownload.

1
2

dl = buildDocList(directory="~/path/to/your/collection/files")
dl = buildDocList(directory = "~/path/to/your/collection/files", stopwordsFile = "~/path/to/your/stopwords/file", indexFile = "~/path/to/your/index/File/")