GetGDELT: Download and subset GDELT V1 event data

View source: R/GetGDELT.R

GetGDELTR Documentation

Download and subset GDELT V1 event data

Description

Download the GDELT V1 Event files necessary for a data set, import them, filter on various criteria, and return a data.frame.

Usage

GetGDELT(
  start_date,
  end_date = start_date,
  row_filter,
  ...,
  local_folder = tempdir(),
  max_local_mb = Inf,
  data_url_root = "http://data.gdeltproject.org/events/",
  verbose = TRUE
)

Arguments

start_date

character, earliest date to include in "YYYY-MM-DD" format.

end_date

character, latest date to include in "YYYY-MM-DD" format.

row_filter

<data-masking> Row selection. Expressions that return a logical value, and are defined in terms of the variables in GDELT. If multiple expressions are included, they are combined with the & operator. Only rows for which all conditions evaluate to TRUE are kept.

...

<tidy-select>, Column selection. This takes the form of one or more unquoted expressions separated by commas. Variable names can be used as if they were positions in the data frame, so expressions like x:y can be used to select a range of variables.

local_folder

character, if specified, where downloaded files will be saved.

max_local_mb

numeric, the maximum size in MB of the downloaded files that will be retained.

data_url_root

character, URL for the folder with GDELT data files.

verbose

logical, if TRUE then indications of progress will be displayed_

Details

Dates are parsed with guess_datetime in the datetimeutils package. The recommended format is "YYYY-MM-DD".

If local_folder is not specified then downloaded files are stored in tempdir(). If a needed file has already been downloaded to local_folder then this file is used instead of being downloaded. This can greatly speed up future downloads.

Value

data.frame

Filtering Results

The row_filter is passed to filter. This is a very flexible way to filter the rows. It's well worth checking out the filter documentation.

Selecting Columns

The ... is passed to select. This is a very flexible way to choose which columns to return. It's well worth checking out the select documentation.

Author(s)

Stephen R. Haptonstahl srh@haptonstahl.org
Thomas Scherer tscherer@princeton.edu
John Beieler jub270@psu.edu

References

GDELT: Global Data on Events, Location and Tone, 1979-2013. Presented at the 2013 meeting of the International Studies Association in San Francisco, CA. https://www.gdeltproject.org/

Examples

## Not run: 
df1 <- GetGDELT(start_date="1979-01-01", end_date="1979-12-31")

df2 <- GetGDELT(start_date="1979-01-01", end_date="1979-12-31",
                row_filter=ActionGeo_CountryCode=="US")

df3 <- GetGDELT(start_date="1979-01-01", end_date="1979-12-31",
                row_filter=Actor2Geo_CountryCode=="RS" & NumArticles==2 & is.na(Actor1CountryCode), 
                1:5)

df4 <- GetGDELT(start_date="1979-01-01", end_date="1979-12-31",
                row_filter=Actor2Code=="COP" | Actor2Code=="MED", 
                contains("date"), starts_with("actor"))

# Specify a local folder to store the downloaded files
df5 <- GetGDELT(start_date="1979-01-01", end_date="1979-12-31",
                row_filter=ActionGeo_CountryCode=="US",
                local_folder = "~/gdeltdata")

## End(Not run)

GDELTtools documentation built on Sept. 29, 2023, 9:07 a.m.