etl: Initialize an 'etl' object

View source: R/etl.R

etlR Documentation

Initialize an etl object

Description

Initialize an etl object

Usage

etl(x, db = NULL, dir = tempdir(), ...)

## Default S3 method:
etl(x, db = NULL, dir = tempdir(), ...)

## S3 method for class 'etl'
summary(object, ...)

is.etl(object)

## S3 method for class 'etl'
print(x, ...)

Arguments

x

the name of the etl package that you wish to populate with data. This determines the class of the resulting etl object, which determines method dispatch of etl_*() functions. There is no default, but you can use mtcars as an test example.

db

a database connection that inherits from src_dbi. It is NULL by default, which results in a SQLite connection being created in dir.

dir

a directory to store the raw and processed data files

...

arguments passed to methods (currently ignored)

object

an object for which a summary is desired.

Details

A constructor function that instantiates an etl object. An etl object extends a src_dbi object. It also has attributes for:

pkg

the name of the etl package corresponding to the data source

dir

the directory where the raw and processed data are stored

raw_dir

the directory where the raw data files are stored

load_dir

the directory where the processed data files are stored

Just like any src_dbi object, an etl object is a data source backed by an SQL database. However, an etl object has additional functionality based on the presumption that the SQL database will be populated from data files stored on the local hard disk. The ETL functions documented in etl_create provide the necessary functionality for extracting data from the Internet to raw_dir, transforming those data and placing the cleaned up data (usually in CSV format) into load_dir, and finally loading the clean data into the SQL database.

Value

For etl, an object of class etl_x and etl that inherits from src_dbi

For is.etl, TRUE or FALSE, depending on whether x has class etl

See Also

etl_create

Examples


# Instantiate the etl object
cars <- etl("mtcars")
str(cars)
is.etl(cars)
summary(cars)

## Not run: 
# connect to a PostgreSQL server
if (require(RPostgreSQL)) {
  db <- src_postgres("mtcars", user = "postgres", host = "localhost")
  cars <- etl("mtcars", db)
}

## End(Not run)

# Do it step-by-step
cars %>%
  etl_extract() %>%
  etl_transform() %>%
  etl_load()
src_tbls(cars)
cars %>%
  tbl("mtcars") %>%
  group_by(cyl) %>%
  summarize(N = n(), mean_mpg = mean(mpg))

# Do it all in one step
cars2 <- etl("mtcars")
cars2 %>%
  etl_update()
src_tbls(cars2)


# generic summary function provides information about the object
cars <- etl("mtcars")
summary(cars)
cars <- etl("mtcars")
# returns TRUE
is.etl(cars)

# returns FALSE
is.etl("hello world")
cars <- etl("mtcars") %>%
  etl_create()
cars

etl documentation built on Oct. 13, 2023, 1:08 a.m.