R/williams.R

#' @title williams: A package of data associated with Williams College and tools for creating it.
#'
#' @description
#' The heart of \code{williams} are the data frames \code{graduates} and \code{faculty}. This
#' document describes how this data was collected, and provides instructions for adding data in
#' subsequent years. The college maintains an
#' [archive](http://web.williams.edu/admin/registrar/catalog/archive.html) of
#' annual course catalogs that serves as a rich basis for information on faculty graduating students.
#'
#' \bold{Graduates}
#' More specifically, under the "Degrees Conferred" section in each course catalog,
#' we find a list of names for graduating students (organized by Latin honor conferred), along with information
#' about their senior thesis, and any related distinctions.
#'
#' For each course catalog, we copy-paste the "Degrees Conferred" section into a text file. We save this in
#' the \code{inst/extdata} directory of the package, using the naming convention
#' "graduates-<year>-<year + 1>.txt".
#'
#' For example, for course catalog for the 2015-2016 academic year (which lists students who
#' graduated in June 2015), we save the list of students into a text file named
#' "graduates-2015-2016.txt" in the \code{inst/extdata} directory.
#'
#' Please note: Due to copy-pasting difficulties from the PDFs, the "copy-paste" step is sometimes tedious.
#' Often, details about several graduates our clumped onto a
#' single line (that is, they appear without line breaks). Here, it is essential to manually seperate these
#' lines out, and ensure that a single line contains information only about a single graduate. We also
#' delete by stray items like page numbers and other detritus.
#'
#' Another complexity that we handle by hand is the apostrophe in "Women's" as it is used in both
#' Women's and Gender Studies and in Women's, Gender and Sexuality Studies. We had trouble handling this
#' apostrophe, because it has a strange encoding. So, we simply changed it to a simple apostrophe by hand.
#' Future years will need to be handled similarly.
#'
#' Once the new file is added, and the package rebuilt, running \code{x <- create_graduates()} will generate
#' a new data frame with all the relevant data. Use the \code{complete = TRUE} argument to provide more detailed
#' information.
#'
#' \bold{Faculty}
#' We need a similarly detailed description of how to handle faculty information.
#'
#' @docType package
#' @name williams
NULL
karantibrewal/williams documentation built on May 3, 2019, 9:40 p.m.