library(WilliamsFaculty)
library(xlsx)
library(DT)
library(ggplot2)
data=read.xlsx(system.file("extdata","data.xlsx", package="WilliamsFaculty"),1)

Introduction

This package explores the age data of all Williams faculty, with special attention paid to two factors: department and gender. The package provides functions that generate summary tables and interesting visual presentations of the data. Some functions generate different summaries and plots depending on the user's input, setting the focus to department or gender. In addition, the package also contains a function that performs statistical analysis on the data.

The original question that this package seeks to answer is: What is the average age of Williams faculty? This package contains a function average_age() that answers that question directly, offering a numerical value. The other functions explore visually and statistically the relationship between age and gender/department. The package relies on ggplot2 for interesting plots and DT for interactive data tables.

Data

The data frame used for this package is internal and comes with the package itself. The original data is retrieved from http://web.williams.edu/admin/registrar//catalog/archive.html under the 2013-14 school year. The professor names, departments, genders, and ages are compiled by hand into an excel file which can be found under /inst/extdata, which is then changed into an RData file through the use of the "xlsx" package. The names and departments are copied directly from the Williams archive, but the gender is guessed from the professor's first name, and the ages are estimated using the year the professor received his/her BA, assuming that the professor is 22 years old at that time. A few visiting professors/lecturers whose education records were not locatable have been omitted from the dataset. Some departments with very few professors have been combined with other departments in the same field to form categories, such as "Physics/Astronomy" and "Theatre/Dance". The following is a histagram of all the ages generated by one of the functions:

plot_age()

The distribution is slightly right-skewed. More characteristics of the distribution will be included in a summary table in the next section.

Use WilliamsFaculty

First and foremost, to find out the average age of the Williams faculty, simply use the function average_age(), which returns a numerical value (all numerical values are limited to 2 decimal digits). To find out more about the age data in general, the user can use the function plot_age() to generate the generic histogram as shown above, and age_summary() to return a summary table like this:

age_summary()

With the DT package, the function dt_table() returns an interactive data table with all of the data. The user can sort and search within each category in this table:

datatable(data, options = list(pageLength=6))

The rest of the functions in the package help the user explore the relations between age and department/gender. To start, age_summary() can generate summary tables with a focus on gender by setting gender=TRUE:

age_summary(gender=TRUE)

The user can also set dpmt=TRUE to generate a DT table with data focusing on departments:

age_summary(dpmt=TRUE)

To visualize the department and gender data, the user can use functions plot_by_dpmt, plot_by_gender,color_by_dpmt, and color_by_gender. The first two functions generates boxplots of ages of each department and gender:

plot_by_dpmt()
plot_by_gender()

The function plot_by_dpmt can also take in input. If the user wishes to see the distribution of ages within a specific department, the user can simply input the department like this: plot_by_dpmt("Languages").

plot_by_dpmt("Languages")

Note that the histograms are also outlined to give a sense of the gender division within that department. The latter two functions generate colored scatter plots that present each professor as a point in the scatter plot. The color_by_gender() function separates the male and female professors by color, while the color_by_dpmt() function separates professors in each department by color:

color_by_gender()
color_by_dpmt()

Note that the youngest and oldest professors are marked. The index (x-axis) is the professor's index in the original data, and since the data is in alphabetical order, the index gives a sense of what the professor's last name starts with. Again, the user can see data about any specific department(s) with color_by_dpmt() if the user gives an input.

color_by_dpmt(c("Physics/Astronomy","Theatre/Dance","Languages"))

Results

Some important results include the mean age, and the results from the statistical analysis performed on the data with regards to department and gender. The mean age is about 51, calculated with average_age().

average_age()

To explore the relationship between gender and age, the user can use test_age(gender=TRUE,dpmt=FALSE), which first performs a binomial test on the number of female professors, and then performs a two-sample t-test on the ages of professors of each gender.

test_age(gender=TRUE, dpmt=FALSE)

To explore the relationship between department and age, the user can use test_age(gender=FALSE,dpmt=TRUE), which performs an analysis of variance on the number of professors of each department.

test_age(gender=FALSE, dpmt=TRUE)

To explore whether department and gender are independent, the user can use test_age(), which performs a chi-squared test of independence on the number of professors of each gender and in each department.

test_age()

The results of the analysis seem to suggest that:

Conclusion

The WilliamsFaculty package calculates the average age of Williams College faculty, and presents the data in an interesting and interactive way. In addition, the package analyzes the relationships between age and department and between age and gender, providing functions to perform statistical analysis and also visualization of the relationships.

The package contains an internal data frame, called data, and the package's contents only work with that particular data frame. A helpful future modification would be to allow users to import their own data frames or raw data files.



yz4/WilliamsFaculty documentation built on May 4, 2019, 8:47 p.m.