v2_doc_api | R Documentation |
Interact with the GDELT V2 Document API Documentation
v2_doc_api(
terms = NULL,
term_domains = NULL,
term_exact_domains = NULL,
use_exact_term = FALSE,
domains = NULL,
domains_exact = NULL,
images_face_tone = NULL,
images_number_faces = NULL,
images_ocr_meta = NULL,
image_tags = NULL,
image_web_counts = NULL,
image_web_tags = NULL,
themes_gkg = NULL,
near_terms = NULL,
near_length = 20,
repeat_terms = NULL,
repeat_count = 3,
source_languages = "english",
source_countries = "United States",
tone = NULL,
tone_absolute = NULL,
modes = "ArtList",
formats = "json",
timespans = NULL,
date_resolution = NULL,
maximum_records = 250,
sort_variable = "DateDesc",
timeline_smooth = NULL,
start_date = NULL,
end_date = NULL,
timezone_adjust = NULL,
time_zoom = NULL,
parse_data = TRUE,
widen_url_parameters = FALSE,
widen_variables = c("mode", "timespan", "format"),
nest_data = FALSE,
return_message = TRUE
)
terms |
This contains your search query and supports keyword and keyphrase searches, OR statements and a variety of advanced operators. |
term_domains |
Vector of domains isolated to search. |
term_exact_domains |
Vector of 'exact' domains to search |
use_exact_term |
If 'TRUE' quotes terms for exact representations. |
domains |
Vector of domains. Returns all coverage from the specified domain. Follow by a colon and the domain name of interest. Search for "domain:cnn.com" to return all coverage from CNN |
domains_exact |
Vector of exact domains |
images_face_tone |
Vector of tones. Searches the average "tone" of human facial emotions in each image. Only human faces that appear large enough in the image to accurately gauge their facial emotion are considered, so large crowd photos where it is difficult to see the emotion of peoples' faces may not be scored accurately. The tone score of an average photograph typically ranges from +2 to -2. To search for photos where visible people appear to be sad, search "imagefacetone<-1.5". Only available in any of the "image" modes |
images_number_faces |
This searches the total number of foreground human faces in the image. |
images_ocr_meta |
This searches a combination of the results of OCR performed on the image in 80+ languages (to extract any text found in the image, including background text like storefronts and signage), all metadata embedded in the image file itself (EXIF, etc) and the textual caption provided for the image. To search for images of a specific event, such as "mobile congress" you would use this field, since that information would most likely either be found in signage in the background of the image, provided in the EXIF metadata in the image or listed in the caption under the image. The search parameter for this field must always be enclosed in quote marks, even when searching for a single word like "imageocrmeta:"zika"". Only available in any of the "image" modes. |
image_tags |
Every image processed by GDELT is assigned one or more topical tags from a universe of more than 10,000 objects and activities recognized by Google's algorithms. This is the primary and most accurate way of searching global news imagery monitored by GDELT, as these tags represent the ground truth of what is actually depicted in the image itself. |
image_web_counts |
Every image processed by GDELT is run through the equivalent of a reverse Google Images search that searches the web to see if the image has ever appeared anywhere else on the web that Google has seen. Up to the first 200 web pages where the image has been seen are returned. This operator allows you to screen for popular versus novel images |
image_web_tags |
Every image processed by GDELT is run through the equivalent of a reverse Google Images search that searches the web to see if the image has ever appeared anywhere else on the web that Google has seen. The system then takes every one of those appearances from across the web and looks at all of the textual captions appearing beside the image and compiles a list of the major topics used to describe the image across the web. This offers tremendous descriptive advantage in that you are essentially "crowdsourcing" the key topics of the image by looking at how it has been described across the web. Values must be enclosed in quote marks. Only available in any of the "image" modes. You can access a list of all tags appearing in at least 100 images (Image WebTag Lookup). |
themes_gkg |
Searches for any of the GDELT Global Knowledge Graph (GKG) Themes. GKG Themes offer a more powerful way of searching for complex topics, since they can include hundreds or even thousands of different phrases or names under a single heading. To search for coverage of terrorism, use "theme:terror". You can find a list of all themes that have appeared in at least 100 articles over the past two years (GKG Theme Lookup). |
near_terms |
Allows you to specify a set of keywords that must appear within a given number of words of each other. To use this operator, you specify the word "near", followed by the maximum distance all of the words can appear apart in a given document and still be considered a match, a colon, and then the list of words in quote marks. Phrase matching is not supported at this time, so the list of words is treated as a list of individual words that must all appear together within the given proximity. Note that if the words appear in a document in a different order than specified in the "near" operator, each ordering difference increments the word distance counted by the "near" operator. (Thus, near10:"donald trump" will return documents where "trump" appears within 10 words after "donald", but will also return documents in which "donald" appears within 9 words after "trump".) The distance measure is not precise and can count punctuation and other tokens as "words" as well. It is also important to remember that proximity in a document does not necessarily imply two words are connected semantically each other. |
near_length |
Vector of lengths to isolate near |
repeat_terms |
Allows you to specify that a given word must appear at least a certain number of times in a document to be considered a match. |
repeat_count |
Vector of repeat counts |
source_languages |
Vector of countries. Searches for articles originally published in the given language. The GEO API currently only allows you to search the English translations of all coverage, but you can specify that you want to limit your search to articles published in a particular language. Using this operator by itself you can map all of the locations mentioned in a particular language across all topics to see the geographic focus of a given language. Search for "sourcelang:spanish" to return only Spanish language coverage. You can also specify its three-character language code. All 65 machine translated languages are supported |
source_countries |
Vector of source countries. Searches for articles published in outlets located in a particular country. This allows you to narrow your scope to the press of a single country. For countries with spaces in their names, type the full name without the spaces (like "sourcecountry:unitedarabemirates" or "sourcecountry:saudiarabia"). You can also use their 2-character FIPS country code |
tone |
Allows you to filter for only articles above or below a particular tone score (ie more positive or more negative than a certain threshold). To use, specify either a greater than or less than sign and a positive or negative number (either an integer or floating point number). To find fairly positive articles, search for "tone>5" or to search for fairly negative articles, search for "tone<-5". |
tone_absolute |
The same as "Tone" but ignores the positive/negative sign and lets you simply search for high emotion or low emotion articles, regardless of whether they were happy or sad in tone. Thus, search for "toneabs<1" for fairly neutral articles or search for "toneabs>10" for fairly emotional articles. |
modes |
This specifies the specific output you would like from the API, ranging from timelines to word clouds to article lists.
|
formats |
This controls what file format the results are displayed in. Not all formats are available for all modes. To assist with website embedding, the CORS ACAO header for all output of the API is set to the wildcard "*", permitting universal embedding
|
timespans |
By default the DOC API searches the last 3 months of coverage monitored by GDELT. You can narrow this range by using this option to specify the number of months, weeks, days, hours or minutes (minimum of 15 minutes). The API then only searches documents published within the specified timespan backwards from the present time. If you would instead like to specify the precise start/end time of the search instead of an offset from the present time, you should use the STARTDATETIME/ENDDATETIME parameters |
date_resolution |
These parameters allow you to specify the precise start and end date/times to search, instead of using an offset like with TIMESPAN. |
maximum_records |
Number of records |
sort_variable |
By default results are sorted by relevance to your query. Sometimes you may wish to sort by date or tone instead.
|
timeline_smooth |
This option is only available in the various Timeline modes and performs moving window smoothing over the specified number of time steps, up to a maximum of 30. Due to GDELT's high temporal resolution, timeline displays can sometimes capture too much of the chaotic noisy information environment that is the global news landscape, resulting in jagged displays. Use this option to enable moving average smoothing up to 30 days. Note that since this is a moving window average, peaks will be shifted to the right, up to several days or weeks at the heaviest smoothing levels. |
start_date |
Start time YYYYMMDDHHMMSS |
end_date |
End time YYYYMMDDHHMMSS |
timezone_adjust |
Timezone Adjus |
time_zoom |
This option is only available for timeline modes in HTML format output and enables interactive zooming of the timeline using the browser-based visualization. Set to "yes" to enable and set to "no" or do not include the parameter, to disable. By default, the browser-based timeline display allows interactive examination and export of the timeline data, but does not allow the user to rezoom the display to a more narrow time span. If enabled, the user can click-drag horizontally in the graph to select a specific time period. If the visualization is being displayed directly by itself (it is the "parent" page), it will automatically refresh the page to display the revised time span. If the visualization is being embedded in another page via iframe, it will use postMessage to send the new timespan to the parent page with parameters "startdate" and "enddate" in the format needed by the STARTDATETIME and ENDDATETIME API parameters. The parent page can then use these parameters to rewrite the URLs of any API visualizations embedded in the page and reload each of them. This allows the creation of dashboard-like displays that contain multiple DOC API visualizations where the user can zoom the timeline graph at the top and have all of the other displays automatically refresh to narrow their coverage to that revised time frame. |
parse_data |
If 'TRUE' parse data |
widen_url_parameters |
if 'TRUE' widens URL parameters |
widen_variables |
If 'TRUE' variables to unite for API urls. Default 'c("mode", "timespan", "format")' |
nest_data |
If 'TRUE' nest parsed data |
return_message |
If 'TRUE' returns message |
library(gdeltr2)
v2_doc_api(terms = c("Donald Trump"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.