related_urls: Social Network Related Sites

Description Usage Arguments Details Value Examples

View source: R/related_urls.R

Description

Use Google's related:site advance search feature to fine websites related to one or more sites of interest.

Usage

1
related_urls(x, maxurls = 10L, delay = 2, excludesites = NULL, ...)

Arguments

x

The url, as a character string, to base the google related site serarch.

maxurls

an integer value specifying the maximum number of related sites to return.

delay

minimum number of seconds to delay between sending queries to Google. The delay is needed to avoid having Google detect the queries as coming from a bot.

excludesites

(default is NULL)

...

passed to html_session

Details

Performing searches on Google via a program can be difficult. Google, and many other search engines, monitor network traffic and will flag traffic that might be coming from a bot. Making several queries very quickly can result in the queries being flag. To address this issue we have a delay argument to pause queries by a given number of seconds.

A preferable method to using the delay, and a method that *might* be implemented at a later date, is to use the Google Custom Search JSON/Atom API, https://developers.google.com/custom-search/json-api/v1/overview. As of August 30, 2017, the "API provides 100 searches per day for free... additional requests cost $5 per 1,000 queries up to 10k queries per day."

TODO: describe passing a timeout option via html_session, httr::config, ...

Value

A sna_sites object. This is a list with four elements:

nodes

A data.frame describing the nodes of a graph

url

url of the node

is_root

A character (looks like a logical, but is a character) with values "TRUE" and "FALSE" denoting if the node is a root (parent) node in the graph

id

node id

name

title of the website

specification
edges

A data.frame describing the edges of a graph

name_from

url of the parent (site) node for the edge

name_to

url of the child (site) note for the edge

id

edge id

node_from

node id of the parent node

node_to

node id of the child node

rank

The order the site was listed in the google search results.

predicate
message

a character string with the value "Blocked" or "Success."

is_blocked

a logical of legnth 1

Examples

1
2
3
4
## Not run: 
related_urls("neptuneinc.org")

## End(Not run)

jhollist/snaWeb documentation built on April 7, 2020, 12:49 a.m.