Tools to Work with the 'webhose.io' 'API'
The 'webhose.io' https://webhose.io/about 'API' provides access to structured web data feeds across vertical content domains. Their crawlers download the web, structure the data and index save it into domain-specific repositories that can be accessed on demand. Methods are provided to query and retrieve content from this 'API'.
Cover the rest of the wehbose.io API.
Covered are
The following functions are implemented:
filter_posts
: Retrieve structured posts data from news articles, blog posts and online discussionsfetch_posts
: Fetch all structured posts data from news articles, blog posts and online discussionsfilter_reviews
: Retrieve structured reviews data from hundreds of review sitesfetch_reviews
: Fetch all structured reviews data from hundreds of review sitesfilter_prodcuts
: Retrieve structured products data from thousands of online retailers and e-commerce sitesfetch_products
: Fetch all structured products data from thousands of online retailers and e-commerce sitesdevtools::install_github("hrbrmstr/webhose")
options(width=120)
library(webhose) # current verison packageVersion("webhose")
Make just one call and/or handle API pagination on your own:
res <- filter_posts("(China AND United) language:english site_type:news site:bloomberg.com", ts = 1213456) str(res)
Auto-handle pagination (NOTE: you're more likelky to rip through your plan API credits this way):
res <- fetch_posts("(China AND United) language:english site_type:news site:bloomberg.com",ts = 1213456) dplyr::glimpse(res)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.