Reads American Political Science Association (APSA) “eJobs” html files, parses the content of these files into a format for muRL to read, and writes that content to a .csv file.
a character string specifying the directory to which a set of APSA job announcement web pages have been downloaded.
a character string specifying the name of the file to which the data should be written.
a character string specifying the extension of the files from which the data will be harvested.
a logical specifying whether the file name and current working directory should be printed.
After logging in to eJobs, the job announcement site of the American Political Science Association (APSA), the user can search for and find the APSA web page announcing a single job listing. The user can download the html from several such pages (usually with a simple “Save As” command, depending on one's operating system).
apsahtml2csv then parses the html code from these pages, and sorts and stores the relevant content. A
.csv file is written containing this content.
If the user downloads the APSA webpages using a different (or no) file extension, that extension (or "") should be specified using the
file.ext argument. Because
apsahtml2csv uses the value of
file.ext in a
grep command, we strongly recommend that the directory specified by
directory include only the downloaded webpages, and no other files or directories.
Institutions are inconsistent in how they enter the names of their jobs' contact representatives. Thus, some tweaking of the output of
apsahtml2csv may be required in order to create a
.csv file that can be seemlessly read by
read.murl. Specifically, the user may have to take the single column of the output of
contact, and create columns called
lname. Additionally, the user may have to adjust the
subfield columns, and institutions may report these somewhat differently.
An R dataframe is created and a
.csv file is written. These include columns containing the APSA job listing ID number, the date the job advertisement was posted, the type of institution, the title and subfield of the position, the start date, salary, and region, the name of the institution and department, the name, address, city, state, ZIP code, and phone number of the individual to contact, the department or institution's web address, and a full paragraph description of the position.
The full paragraph description is stored in a column named
desc. Due to the current parsing strategy, this field may include some excess characters from the APSA html page.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.