etl_extract.etl_imdb: Set up local IMDB
In beanumber/imdb: Populate a database with data from the IMDB

Description Usage Arguments Details Source Examples

Download the raw data files from IMDB

## S3 method for class 'etl_imdb'
etl_extract(obj, tables = c("movies", "actors",
  "actresses", "directors"), all.tables = FALSE, ...)

## S3 method for class 'etl_imdb'
etl_load(obj, path_to_imdbpy2sql = NULL,
  password = "", ...)

etl_load_data(obj, ...)

## S3 method for class 'src_mysql'
etl_load_data(obj, ...)

`obj`	an `etl` object
`tables`	a character vector of files from IMDB to download. The default is movies, actors, actresses, and directors. These four files alone will occupy more than 500 MB of disk space. There are 49 total files available on IMDB. See ftp://ftp.fu-berlin.de/pub/misc/movies/database/ for the complete list.
`all.tables`	a logical indicating whether you want to download all of the tables. Default is `FALSE`.
`...`	arguments passed to methods
`path_to_imdbpy2sql`	a path to the IMDB2SQL Python script provided by IMDBPy. If NULL – the default – will attempt to find it using `findimdbpy2sql`.
`password`	Must re-enter password unless your password is blank. The real password will not be shown in messages.

For best performance, set the MySQL default collation to utf8_unicode_ci. See the IMDbPy2sql documentation at http://imdbpy.sourceforge.net/docs/README.sqldb.txt for more details.

Please be aware that IMDB contains information about *all* types of movies.

IMDB: ftp://ftp.fu-berlin.de/pub/misc/movies/database/temporaryaccess/

IMDbPy: http://imdbpy.sourceforge.net/

# Connect using default RSQLite database
imdb <- etl("imdb")

# Connect using pre-configured PostgreSQL database
## Not run: 
 if (require(RPostgreSQL)) {
   # must have pre-existing database "imdb"
   db <- src_postgres(host = "localhost", user="postgres", 
                      password="postgres", dbname = "imdb")
  }
  imdb <- etl("imdb", db = db, dir = "~/dumps/imdb/")
  imdb %>%
    etl_extract(tables = "movies") %>%
    etl_load()

## End(Not run)
## Not run: 
 if (require(RMySQL)) {
   # must have pre-existing database "imdb"
   db <- src_mysql_cnf(dbname = "imdb")
  }
  imdb <- etl("imdb", db = db, dir = "~/dumps/imdb/")
  imdb %>%
    etl_extract(tables = "movies") %>%
    etl_load()
    
  movies <- imdb %>%
    tbl("title") 
  movies %>%
    filter(title == 'star wars')
    
  people <- imdb %>%
    tbl("name") 
  roles <- imdb %>%
    tbl("cast_info") 
  movies %>%
    inner_join(cast_info, by = c("id" = "movie_id")) %>%
    inner_join(people, by = c("person_id" = "id")) %>%
    filter(title == 'star wars') %>%
    filter(production_year == 1977) %>%
    arrange(nr_order)
  

## End(Not run)