R/data-ucla_textbooks_f18.R

#' Sample of UCLA course textbooks for Fall 2018
#'
#' A sample of courses were collected from UCLA from Fall 2018, and the
#' corresponding textbook prices were collected from the UCLA bookstore and
#' also from Amazon.
#'
#' A past dataset was collected from UCLA courses in Spring 2010, and Amazon
#' at that time was found to be almost uniformly lower than those of the UCLA
#' bookstore's.  Now in 2018, the UCLA bookstore is about even with Amazon on
#' the vast majority of titles, and there is no statistical difference in the
#' sample data.
#'
#' The most expensive book required for the course was generally used.
#'
#' The reason why we advocate for using raw amount differences instead of
#' percent differences is that a 20\% savings on a $10 book is minor relative
#' to a 20\% savings on a $100 book, meaning a small and largely insignificant
#' price difference on low-priced books would balance numerically (but not in a
#' practical sense) a moderate but important price difference on more expensive
#' books.  So while this tends to result in a bit less sensitivity in detecting
#' \emph{some} effect, we believe the absolute difference compares prices in a
#' more meaningful way.
#'
#' Used prices contain the shipping cost but do not contain tax.  The used
#' prices are a more nuanced comparison, since these are all 3rd party sellers.
#' Amazon is often more a marketplace than a retail site at this point, and
#' many people buy from 3rd party sellers on Amazon now without realizing it.
#' The relationship Amazon has with 3rd party sellers is also challenging.
#' Given the frequently changing dynamics in this space, we don't think any
#' analysis here will be very reliable for long term insights since products
#' from these sellers changes frequently in quantity and price.  For this
#' reason, we focus only on new books sold directly by Amazon in our
#' comparison.  In a future round of data collection, it may be interesting to
#' explore whether the dynamics have changed in the used market.
#'
#' @name ucla_textbooks_f18
#' @docType data
#' @format A data frame with 201 observations on the following 20 variables.
#' \describe{
#'   \item{year}{Year the course was offered}
#'   \item{term}{Term the course was offered}
#'   \item{subject}{Subject}
#'   \item{subject_abbr}{Subject abbreviation, if any}
#'   \item{course}{Course name}
#'   \item{course_num}{Course number, complete}
#'   \item{course_numeric}{Course number, numeric only}
#'   \item{seminar}{ Boolean for if this is a seminar course.}
#'   \item{ind_study}{ Boolean for if this is some form of independent study}
#'   \item{apprenticeship}{ Boolean for if this is an apprenticeship}
#'   \item{internship}{ Boolean for if this is an internship}
#'   \item{honors_contracts}{ Boolean for if this is an honors contracts course}
#'   \item{laboratory}{Boolean for if this is a lab}
#'   \item{special_topic}{ Boolean for if this is any of the special types of courses listed}
#'   \item{textbook_isbn}{Textbook ISBN}
#'   \item{bookstore_new}{ New price at the UCLA bookstore}
#'   \item{bookstore_used}{ Used price at the UCLA bookstore}
#'   \item{amazon_new}{New price sold by Amazon}
#'   \item{amazon_used}{Used price sold by Amazon}
#'   \item{notes}{Any relevant notes}
#' }
#' @seealso \code{\link{textbooks}}, \code{\link{ucla_f18}}
#' @source \url{https://sa.ucla.edu/ro/public/soc}
#'
#' \url{https://ucla.verbacompare.com}
#'
#' \url{https://www.amazon.com}
#' @keywords datasets
#' @examples
#'
#' library(ggplot2)
#' library(dplyr)
#'
#' ggplot(ucla_textbooks_f18, aes(x = bookstore_new, y = amazon_new)) +
#'   geom_point() +
#'   geom_abline(slope = 1, intercept = 0, color = "orange") +
#'   labs(
#'     x = "UCLA Bookstore price", y = "Amazon price",
#'     title = "Amazon vs. UCLA Bookstore prices of new textbooks",
#'     subtitle = "Orange line represents y = x"
#'   )
#'
#' # The following outliers were double checked for accuracy
#' ucla_textbooks_f18_with_diff <- ucla_textbooks_f18 |>
#'   mutate(diff = bookstore_new - amazon_new)
#'
#' ucla_textbooks_f18_with_diff |>
#'   filter(diff > 20 | diff < -20)
#'
#' # Distribution of price differences
#' ggplot(ucla_textbooks_f18_with_diff, aes(x = diff)) +
#'   geom_histogram(binwidth = 5)
#'
#' # t-test of price differences
#' t.test(ucla_textbooks_f18_with_diff$diff)
"ucla_textbooks_f18"
OpenIntroStat/openintro documentation built on June 4, 2024, 4:19 a.m.