library(trainpack)
library(ggplot2)

Trainpack Vignette

Introduction

The idea of our package is to design the functions that will feed our shiny application. We produced an application on which people can bet on the delay of their train. Inspired by the sports betting websites (betfair, pmu, winamax ...), we created a simple display where someone can bet on the delay of a train. We also wanted to display some facts and useful data to help the bettor placing his/her bet. Therefore, the main page of our application will be a simple menu to choose the train on which to bet. Once chosen, the odds are displayed with some facts that might influence the probability of the train to be late. There is also a space where to put the amount of money to bet. Once confirmed, the potential gain is displayed.

Datasets

To build this package, we started from the train data from the open data website of the SNCF. Unfortunately, we only have access to the monthly data of every TGV line and not the detail of all the trains. We met the SNCF head of data at a forum in Paris in october and asked him whether it was possible to recover this kind of data for all the trains but it is not yet possible so we adapted our calculations and functions accordingly. We cleaned the dataset to remove the points and the accent form the column name in order to be able to put it on github and export it in other countries.

head(trainpack::SNCF_regularite)

Then we wanted to include maps that will show the delays of the train depending on the departement of the arrival station. We created a To create the map we need the department associated with the departure and arrival train station. To automate this search and implement it in a scalable way we used google maps api to get from a string the first result latitude and longitude, then we converted it back to a postal code and country (to filter out non French destinations). Then we needed to add the code per train line. Once we have a table with line names, departement codes and cotes we can run our carto function.

head(trainpack::departement)

Finally we wanted to include an indicator of the probability for a strike to influence the delay of a train. We thought the number of articles containing the words 'grève sncf' published on LeMonde.fr would be a good proxi to estimate whether there were a strike at this time or not. From the research tool on the website, we had access to all the articles on the time period we are working on, so we scraped all the dates of the articles to get a final dataframe with the number of article of interest per month over the entire period.

head(trainpack::lemonde_dates)

Functions

Given this dataset, we first needed a function that will return the odds of the train to be late. To identify the train, the easiest characterisitcs where the station of departure and arrival, and the month of the travel. The function is very simple but easily complexifiable: the idea was to build something simple and maybe complexify, affine our predictions of the probability of a specific train to arrive late in the future. To do this, the only thing to modify in the entire package would be the function "fonction_cote". Now, it returns something very simple.

fonction_cote("PARIS MONTPARNASSE", "NANTES", 3)

Then we needed other functions that would be able to identify the train selected and provide the related data about the usual reasons for its delays, its average delay over the last 3 years, the number of articles mentionning the words "grève sncf" on LeMonde.fr during the last month, the map of the trip, the average delay in the departement of arrival ...

fonction_raison("PARIS MONTPARNASSE", "NANTES", 3)
fonction_article(11)
carto("PARIS MONTPARNASSE")

ALl those facts are included in diagrams or in boxes on the appliaction and allows to inform the bettor.

Finally we added another small function called "ypep" just for fun that returns a picture of the CEO of SNCF when writing the good password.

App user guide

First tab

The first tab is a small presentation of the app, a user-friendly welcome message and a small presentation of our team.

Second tab

The second tab allows the bettor to place his/her bet on the train of his choice by choosing the departure station, the arrival station, the month of departure and obvously, the amount he or she wants to bet. It returns the odds and the potential gain the bettor can win if his/her prediction is true.

Third tab

The third tab present the results of our analysis in a convenient way. The bettor have access to a variety of graphs such as the average delay, the reasons for delays, the proportion of train delayed.

Fourth tab

The fourth tab is dedicated to the map on which the bettor can see the records of delays of the trains departing from a specific station per departement.



ZiggerZZ/Rproject documentation built on May 31, 2019, 6:40 p.m.