buildDocList: Create a structured list of your collection's metadata

Description Usage Arguments Value Description Examples

View source: R/buildDocList.R

Description

Create a structured list of your collection's metadata

Usage

1
2
buildDocList(directory = "", stopwordsFile = "", indexFile = "",
  import = TRUE, normalize = TRUE)

Arguments

directory

A string that is the path to the directory where the files that make up your corpus are located.

stopwordsFile

A string that is the path to the file that contains the words that are to be removed from the text in the cleanup function. If left blank, the default stopwords will be provided.

indexFile

A string that is the path to the index file for the collection. This function expects to find a .csv file that contains the metadata for your collection, including a column that points to the names of the files (with or without the file extensions). If you do not have an index file, this parameter can be left blank and the collection will take all files in the directory.

import

A logical vector. If TRUE, the texts will be imported to your docList and stored as a list of character vectors. This runs the importTexts function over the collection, which in turn performs a cleanup function that processed the texts for analysis.

normalize

A logical vector. If TRUE, the texts will be normalized upon import.

Value

dl The completed docList object for use with the other functions of the tei2r package.

Description

The docList object is the foundation of the tei2r package and contains references to all of the basic information that is required to begin working with texts in R. The buildDocList function is designed to construct the docList object with as much or as little information available. You can begin either with a collection of plain-text or TEI-encoded documents stored in a folder (a 'directory'), or you can begin by searching the EEBO-TCP collection and downloading their files using tcpSearch and tcpDownload.

Examples

1
2
dl = buildDocList(directory="~/path/to/your/collection/files")
dl = buildDocList(directory = "~/path/to/your/collection/files", stopwordsFile = "~/path/to/your/stopwords/file", indexFile = "~/path/to/your/index/File/")

michaelgavin/tei2r documentation built on May 22, 2019, 9:50 p.m.