Description Slots What it does See Also
docList
returns a special tei2r
object that contains
a list of information about your document collection.
directory
A string that gives the filepath to the main directory (folder), which holds all the files in the collection.
filenames
A vector containing all of the filenames for the documents in the collection.
paths
A vector containing the full path to each file in the collection.
indexFile
A string that gives the filepath to the index file for the corpus. This file should house the meta-data for each file in the corpus.
index
A data frame that holds the meta-data for each document in
the corpus. This data frame is created by reading the file
found at indexFile.
stopwordsFile
A string that gives the filepath to the file that contains a comma seperated list of words to be removed during text cleanup.
stopwords
A vector derived from the stopwordsFile
that is passed
to the text cleanup functions in order for them to be removed
from the text.
texts
A list of character vectors, each drawn from documents in the collection, and each placed in the order provided by the index.
The docList
is the foundation of the tei2r
package and should be the first object created when working
with the package. The object is constructed by calling the
buildDocList
function. This function builds the object by
storing the path to the collection's files (directory
), the
file containing the collection's meta-data (indexFile
), and
the stopwords file (stopwordsFile
). From these pieces
of information, the function automatically determines the
filenames
and paths
for the collection's files.
buildDocList
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.