etl_cleanup | R Documentation |
These generic functions provide a systematic approach for performing ETL (exchange-transform-load) operations on medium sized data.
etl_cleanup(obj, ...)
## Default S3 method:
etl_cleanup(
obj,
delete_raw = FALSE,
delete_load = FALSE,
pattern = "\\.(csv|zip)$",
...
)
etl_create(obj, ...)
## Default S3 method:
etl_create(obj, ...)
etl_update(obj, ...)
## Default S3 method:
etl_update(obj, ...)
etl_extract(obj, ...)
## Default S3 method:
etl_extract(obj, ...)
## S3 method for class 'etl_mtcars'
etl_extract(obj, ...)
## S3 method for class 'etl_cities'
etl_extract(obj, ...)
etl_load(obj, ...)
## Default S3 method:
etl_load(obj, ...)
etl_transform(obj, ...)
## Default S3 method:
etl_transform(obj, ...)
## S3 method for class 'etl_cities'
etl_transform(obj, ...)
obj |
an |
... |
arguments passed to methods |
delete_raw |
should files be deleted from the |
delete_load |
should files be deleted from the |
pattern |
regular expression matching file names to be deleted. By default,
this matches filenames ending in |
The purposes of these functions are to download data from a particular data source from the Internet, process it, and load it into a SQL database server.
There are five primary functions:
etl_init
Initialize the database schema.
Download data from the Internet and store it locally in its raw form.
Manipulate the raw data such that it can be loaded into a database table. Usually, this means converting the raw data to (a series of) CSV files, which are also stored locally.
Load the transformed data into the database.
Perform housekeeping, such as deleting unnecessary raw data files.
Additionally, two convenience functions chain these operations together:
Run all five functions in succession. This is useful when you want to create the database from scratch.
Run the etl_extract
-etl_transform
-etl_load
functions
in succession.
This is useful
when the database already exists, but you want to insert some new data.
Each one of these functions returns an etl
object, invisibly.
etl
, etl_init
## Not run:
if (require(RPostgreSQL)) {
db <- src_postgres(dbname = "mtcars", user = "postgres", host = "localhost")
cars <- etl("mtcars", db)
}
if (require(RMySQL) && mysqlHasDefault()) {
db <- src_mysql(dbname = "mtcars", user = "r-user",
host = "localhost", password = "mypass")
cars <- etl("mtcars", db)
}
## End(Not run)
cars <- etl("mtcars")
cars %>%
etl_extract() %>%
etl_transform() %>%
etl_load() %>%
etl_cleanup()
cars
cars %>%
tbl(from = "mtcars") %>%
group_by(cyl) %>%
summarise(N = n(), mean_mpg = mean(mpg))
# do it all in one step, and peek at the SQL creation script
cars %>%
etl_create(echo = TRUE)
# specify a directory for the data
## Not run:
cars <- etl("mtcars", dir = "~/dumps/mtcars/")
str(cars)
## End(Not run)
cars <- etl("mtcars")
# Do it step-by-step
cars %>%
etl_extract() %>%
etl_transform() %>%
etl_load()
# Note the somewhat imprecise data types for the columns. These are the default.
tbl(cars, "mtcars")
# But you can also specify your own schema if you want
schema <- system.file("sql", "init.sqlite", package = "etl")
cars %>%
etl_init(schema) %>%
etl_load()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.