testURLs: Test URLs for intermittent download problems

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/testURLs.R

Description

***NOTE: THIS IS A PRELIMINARY VERSION OF THIS FUNCTION; ***NOTE: IT MAY BE CHANGED OR REMOVED IN A FUTURE RELEASE.

try(getURL(...)) to read each element of urls. After each try, write a row to file. indicating which of urls was tested, the test time in seconds, and any error message. Repeat any failures up to maxFail times. After testing each element of urls once, repeat n times.

If(ping), preceed each test with "ping url[i]". NOTE: Some Internet Service Providers seem to block some attepts to use "ping" or return fraudulet replies to "ping". It is included in the code, because it seemed like an obvious test. However, it is not executed by default because the results do not necessarily reflect what people might expect from "ping".

Return a list of the last successful version read if any from each element of urls with two attributes: (1) "urls" containing the urls argument. (2) "testResults" being an object of class c('testURLs', 'data.frame') of the test results written to file..

This function was written to diagnose a download problem with a particular Internet Service Provider (ISP). For other tools for testing an ISP, see measurementlab.net or the "Test your ISP" software discussion by the Electronic Frontier Foundation at the URL mentioned in references below.

Usage

1
2
3
4
5
6
7
testURLs(urls=c(
 wiki="http://en.wikipedia.org",
 wiki.PVI="http://en.wikipedia.org/wiki/Cook_Partisan_Voting_Index",
 house="http://house.gov",
 house.reps="http://house.gov/representatives"),
         file.='testURLresults.csv',
         n=10, maxFail=10, warn=-1, tzone='GMT', ping=FALSE, ...)

Arguments

urls

a character vector assumed to be universal resource locators to pass to getURL for testing.

The default was selected to provide a 2 x 2 experiment with two different web sites (en.wikipedia.org and house.gov) vs. the landing page and a subordinate page for each site.

file.

Name of a CSV file to which to write the results. If the file already exists, new results are appended to it.

n

number of times to repeat the cycle testing each member of urls.

maxFail

max tests for a continually failing URL. This is designed to make it relatively easy to determine determine dependencies between failures. If the failure rate is constant, the number of consecutive failures will follow a Poisson distribution. Otherwise, it may be possible to evaluate various effects using, e.g., state space techniques for non-normal time series. This could include daily and weekly cycles possibly with holiday effects and trends as well as drifts suggesting abnormal drifts in web traffic congestion.

warn

warn argument to pass to Ping.

tzone

Time zone for Time. Defaults to GMT (UTC). tzone=NULL will use the current locale.

ping

logical: TRUE to include Ping, FALSE otherwise.

...

optional arguments for Ping.

Details

for(i in 1:n):

1. pingi <- Ping(urls[i], ...)

2. The time for each call to getURL is computed by computing start.time <- proc.time() before calling try(getURL(.)), then computing the following after:

elapsed.time <- max(proc.time() - start.time, na.rm=TRUE)

After each of the urls is tested, a summary of the results is appended to file.. This includes the pingi[['stats']], elapsed.time and the error message if the download failed.

The Electronic Frontier Foundation provides a table of existing software to "Test your ISP"; see the references below. This table includes a column noting whether the software is "active" (sending test traffic) or "passive" (observing the way the network treats natural traffic). The current testURLs function is "active", because it asks for a copy of the code at the indicated URL.

Value

an object of class testURLs, which in this case is a list of the last successful result returned by getURL for each element of urls with the following attributes:

urls

the urls argument used for this call

testURLresults

an object of class c('testURLs', 'data.frame') of the data written to file.. This has the following columns:

  • Time date() for the time a particular test started

  • URL the name in urls of the URL tested

  • ping statistics several columns with the count and stats returned by Ping.

  • readTime time in seconds for the attempt to read the URL (getURL(urls[j])) to complete.

  • error character: ” if the read attempt was successful; the error message if not.

Author(s)

Spencer Graves

References

measurementlab.net "Test your ISP" software discussion by the Electronic Frontier Foundation "active" (sending test traffic) or "passive" (observing the way the network treats natural traffic).

See Also

try getURL Ping

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Test only 2 web sites, not the default 4,
# and test only twice, not the default 10 times:
tst <- testURLs(c(
 PVI="http://en.wikipedia.org/wiki/Cook_Partisan_Voting_Index",
 house="http://house.gov/representatives"),
    n=2, maxFail=2)


(class(tst) == 'testURLs') &&
all(names(tst) == c('PVI', 'house')) &&
all(names(attributes(tst)) ==
    c('names', 'urls', 'testURLresults', 'class'))

Ecfun documentation built on May 2, 2019, 6:53 p.m.