williams: williams: A package of data associated with Williams College...

Description

Description

The heart of williams are the data frames graduates and faculty. This document describes how this data was collected, and provides instructions for adding data in subsequent years. The college maintains an [archive](http://web.williams.edu/admin/registrar/catalog/archive.html) of annual course catalogs that serves as a rich basis for information on faculty graduating students.

Graduates More specifically, under the "Degrees Conferred" section in each course catalog, we find a list of names for graduating students (organized by Latin honor conferred), along with information about their senior thesis, and any related distinctions.

For each course catalog, we copy-paste the "Degrees Conferred" section into a text file. We save this in the inst/extdata directory of the package, using the naming convention "graduates-<year>-<year + 1>.txt".

For example, for course catalog for the 2015-2016 academic year (which lists students who graduated in June 2015), we save the list of students into a text file named "graduates-2015-2016.txt" in the inst/extdata directory.

Please note: Due to copy-pasting difficulties from the PDFs, the "copy-paste" step is sometimes tedious. Often, details about several graduates our clumped onto a single line (that is, they appear without line breaks). Here, it is essential to manually seperate these lines out, and ensure that a single line contains information only about a single graduate. We also delete by stray items like page numbers and other detritus.

Another complexity that we handle by hand is the apostrophe in "Women's" as it is used in both Women's and Gender Studies and in Women's, Gender and Sexuality Studies. We had trouble handling this apostrophe, because it has a strange encoding. So, we simply changed it to a simple apostrophe by hand. Future years will need to be handled similarly.

Once the new file is added, and the package rebuilt, running x <- create_graduates() will generate a new data frame with all the relevant data. Use the complete = TRUE argument to provide more detailed information.

Faculty We need a similarly detailed description of how to handle faculty information.


karantibrewal/williamsmetrics documentation built on May 20, 2019, 7:21 a.m.