knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(whistleblower)

Significant variation in locations:

Text with no locations, Text with multiple locations, Text with abbreviations,

Sequence for addressing locations:

  1. source_tweets.R in the inst folder uses the Twitter search API to download a sample and saves the sample to the extdata folder with a date suffix in the file name.

  2. tweets_dataset.R in the data-raw folder combines the files in the extdata folder into a single source file and save it to the data folder.

  3. locations_dataset.R in the data-raw folder extracts the unique locations from the tweets.rda file in the data folder and saves the locations as locations_raw.csv in the data folder.

  4. Open the locations_raw.csv file in Microsoft Excel and append the city, state and country fields manually by referencing the location column. If there is no location, or if the location does not indicate a city, state and/or country, mark the city, state and country as NA (missing value). Save the file as locations_cleaned.csv.

  5. clean_tweets.R in the R folder contains a function to load the locations_cleaned.csv file as a data frame and joins the appended location data (city, state and country) to the tweets source data.



dtminnick/whistleblower documentation built on Nov. 14, 2019, 2:45 p.m.