Load plain-text data and RData from a URL (either http or https)

Share:

Description

source_data loads plain-text or RDATA formatted data stored at a URL (both http and https) into R.

Usage

1
2
3
source_data(url, rdata, sha1 = NULL, cache = FALSE, clearCache = FALSE,
  sep = "auto", header = "auto", stringsAsFactors = FALSE,
  envir = parent.frame(), ...)

Arguments

url

The data's URL. To distinguish between plain-text and RDATA the url must end in a distinguishing file extension.

rdata

logical. Whether or not the data set is an .RDATA file. If not specified than source_url will attempt to determine whether or not the file is an .RDATA file from the URL's extension.

sha1

Character string of the file's SHA-1 hash, generated by source_data. Note if you are using data stored using Git, this is not the file's commit SHA-1 hash.

cache

logical. Whether or not to cache the data so that it is not downloaded every time the function is called.

clearCache

logical. Whether or not to clear the downloaded data from the cache.

sep

The separator method for the plain-text data. For example, to load comma-separated values data (CSV) use sep = ",". To load tab-separated values data (TSV) use sep = "\t". Only relevant for plain-text data.

header

Logical, whether or not the first line of the file is the header (i.e. variable names).

stringsAsFactors

logical. Convert all character columns to factors?

envir

the environment where the data should be loaded.

...

additional arguments passed to fread or load as relevant.

Details

Loads plain-text data (e.g. CSV, TSV) or RDATA from a URL. Works with both HTTP and HTTPS sites. Note: the URL you give for the url argument must be for the RAW version of the file. The function should work to download plain-text data from any secure URL (https), though I have not verified this.

From the source_url documentation: "If a SHA-1 hash is specified with the sha1 argument, then this function will check the SHA-1 hash of the downloaded file to make sure it matches the expected value, and throw an error if it does not match. If the SHA-1 hash is not specified, it will print a message displaying the hash of the downloaded file. The purpose of this is to improve security when running remotely-hosted code; if you have a hash of the file, you can be sure that it has not changed."

Value

a data frame

Source

Originally based on source_url from the Hadley Wickham's devtools package.

See Also

httr, fread, and load

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## Not run: 
# Download electoral disproportionality data stored on GitHub
# Note: Using shortened URL created by bitly
DisData <- source_data("http://bit.ly/156oQ7a")

# Check to see if SHA-1 hash matches downloaded file
DisDataHash <- source_data("http://bit.ly/Ss6zDO",
   sha1 = "dc8110d6dff32f682bd2f2fdbacb89e37b94f95d")

## End(Not run)