WebCrawler: Get web pages

View source: R/GetRealTimeGrib.R

WebCrawlerR Documentation

Get web pages

Description

Discover all links on a given web page, follow each one, and recursively scan every link found. Return a list of web addresses whose pages contain no links.

Usage

WebCrawler(url, depth = NULL, verbose = TRUE)

Arguments

url

A URL to scan for links.

depth

How many links to return. This avoids having to recursively scan hundreds of links. Defaults to NULL, which returns everything.

verbose

Print out each link as it is discovered. Defaults to TRUE.

Details

CrawlModels uses this function to get all links present on a model page.

Value

urls.out

A list of web page addresses, each of which corresponds to a model instance.

Note

While it might be fun to try WebCrawler on a large website such as Google, the results will be unpredictable and perhaps disastrous if depth is not set. This is because there is no protection against infinite recursion.

Author(s)

Daniel C. Bowman danny.c.bowman@gmail.com

See Also

CrawlModels, ParseModelPage

Examples


#Find the first 10 model runs for the 
#GFS 0.5x0.5 model

## Not run: urls.out <- WebCrawler(
"http://nomads.ncep.noaa.gov/cgi-bin/filter_gfs_0p50.pl", depth = 10)
## End(Not run)


rNOMADS documentation built on May 29, 2024, 6:44 a.m.