AirlineArrival: Airline On-Time Arrival Data

AirlineArrivalR Documentation

Airline On-Time Arrival Data

Description

Flights categorized by destination city, airline, and whether or not the flight was on time.

Format

A data frame with 11000 observations on the following 3 variables.

airport

a factor with levels LosAngeles, Phoenix, SanDiego, SanFrancisco, Seattle

result

a factor with levels Delayed, OnTime

airline

a factor with levels Alaska, AmericaWest

Source

Barnett, Arnold. 1994. “How numbers can trick you.” Technology Review, vol. 97, no. 7, pp. 38–45.

References

These and similar data appear in many text books under the topic of Simpson's paradox.

Examples


tally(
  airline ~ result, data = AirlineArrival, 
  format = "perc", margins = TRUE)
tally(
  result ~ airline + airport, 
  data = AirlineArrival, format = "perc", margins = TRUE)
AirlineArrival2 <- 
  AirlineArrival %>% 
  group_by(airport, airline, result) %>% 
  summarise(count = n()) %>%
  group_by(airport, airline) %>%
  mutate(total = sum(count), percent = count/total * 100) %>% 
  filter(result == "Delayed") 
AirlineArrival3 <- 
  AirlineArrival %>% 
  group_by(airline, result) %>% 
  summarise(count = n()) %>%
  group_by(airline) %>%
  mutate(total = sum(count), percent = count/total * 100) %>% 
  filter(result == "Delayed") 
  gf_line(percent ~ airport, color = ~ airline, group = ~ airline, 
          data = AirlineArrival2) %>%
    gf_point(percent ~ airport, color = ~ airline, size = ~total, 
             data = AirlineArrival2) %>%
    gf_hline(yintercept = ~ percent, color = ~airline, 
             data = AirlineArrival3, linetype = "dashed") %>%
    gf_labs(y = "percent delayed") 

rpruim/fastR documentation built on Nov. 12, 2023, 12:26 p.m.