This package includes functions to query the Article Search API of The New York Times for articles with an “India” location keyword. It also includes functions to prepare this data to be ready for analysis, as well as a shiny app to visualize the output dataset.
The impetus to restructure what was originally a data analysis project as a package was:
Accordingly, this is mostly a personal package, but it can be installed from GitHub:
install.packages("devtools")
devtools::install_github("seanangio/nytindia")
To query all articles with an India location keyword from the NYT Article Search API between two dates, run the following in the top level folder of a new RStudio project.
library(nytindia)
nyt_get_data(begin_date = "YYYY-MM-DD",
end_date = "YYYY-MM-DD")
You’ll need environmental variables called “NYTIMES_KEY” and “NYT_USER_AGENT”.
If the directory is empty, leaving begin_date
empty will default to
1851 (the earliest available date). Otherwise, it will begin from the
last date found. Leaving end_date
empty will default to the current
date.
It queries up to the closest completed month.
The query searches for all articles with an “India” location keyword. You should be able to change the default query with the parameters “q” and “fq”. See the reference documentation here.
Once you have your data from the API, there is a pipeline of functions available to prepare the data. You can run the following to create the prepared dataset.
nyt_build_data()
This should successfully output a dataset in a folder called
nyt_shiny_app
, but there are a number of manual steps that should
ideally be included – as explained in the reference documentation and
the Technical
Details
vignette.
These are optional, but without them:
That’s why nyt_build_data()
is especially useful when just updating a
dataset with a new month, when the lookup tables and geocoding has
already been done.
To build the dataset step by step, use the following script. This is
essentially what nyt_build_data()
is doing.
# 01-query-nyt-api.R
api_df <- nyt_bind_api_files()
# 02-prepare-nested.R
combined_df <- nyt_clean_api_tbl(api_df)
# 03-clean-news-desks.R
nested_df <- nyt_clean_news_desks(combined_df)
# 04-unnest-df.R
unnested_df <- nyt_unnest_df(nested_df)
# 05-clean-keywords.R
consolidated_unnested_df <- nyt_clean_keywords(unnested_df)
# 06-fix-keywords.R
unnested_df_values_fixed <- nyt_fix_keywords(consolidated_unnested_df)
# 07-query-mapquest.R
nyt_query_mapquest_api(unnested_df_values_fixed)
# 08-add-coords-countries.R
full_unnested_df <- nyt_join_coords_countries(unnested_df_values_fixed)
# 09-re-nest-keywords.R
full_nested_df <- nyt_re_nest_keywords(full_unnested_df)
# 10-write-final-nested-df.R
nyt_write_final_nested_df(full_nested_df)
# 11-download-shiny-files.R
nyt_download_shiny_files()
The package includes a shiny app to visualize the results in many different ways.
To run it locally, you’ll need the following packages included in the “Suggests” section of this package’s DESCRIPTION file.
shiny_pkgs <- c("shiny","markdown","ggplot2","forcats","scales",
"shinyWidgets","bsplus","shinycssloaders","DT",
"ggiraph","tidytext","tsbox","dygraphs","gt",
"leaflet","leaflet.extras","shinydashboard",
"waiter")
install.packages(shiny_pkgs)
nyt_run_example("nyt_india_app")
You can find package vignettes analyzing the data and summarizing the technical details.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.