website/file_formats.markdown

layout: page

ProjectTemplate can automatically load a variety of text based file formats, including comma separated value (CSV) files, tab separated value (TSV) files and generic whitespace separated value (WSV) files. In addition, automatic data loading is supported for several binary file formats, including the RData, SPSS, Stata and SAS formats.

Beyond those file formats, several ad hoc file types support the loading of data sets that are accessible over HTTP or contained in SQL databases, such as MySQL and sqlite.

Please note that several of file formats have not been tested yet, including Weka files, DBF files, EPIInfo files, MTP files, Octave files, Systat files and SAS files. Because ProjectTemplate is simply wrapping the 'foreign' package, these file formats are expected to work, but we have not confirmed that yet. Your mileage may vary.

Supported File Extensions

Ad Hoc File Types

URL Files

You can access CSV files over HTTP using the .url file extension. Inside of the .url file, you must place DCF that describes your data sources. An example file is shown below:

    url: http://www.johnmyleswhite.com/ProjectTemplate/sample_data.csv
SQL Files

ProjectTemplate supports access to many of the most common databases. All databases use the .sql file extension. Inside of the .sql file, you must place DCF that describes the connection protocol for your database. Example files for the support databases are shown below.

MySQL:
    type: mysql
    user: sample_user
    password: sample_password
    host: localhost
    dbname: sample_database
    table: sample_table
SQLite:
    type: sqlite
    dbname: /path/to/sample_database
    table: sample_table

    type: sqlite
    dbname: /path/to/sample_database
    query: SELECT * FROM users WHERE user_active == 1
PostgreSQL:
    type: postgres
    user: sample_user
    password: sample_password
    host: localhost
    dbname: sample_database
    table: sample_table
ODBC:
    type: odbc
    dsn: sample_dsn
    user: sample_user
    password: sample_password
    dbname: sample_database
    query: SELECT * FROM sample_table
Oracle:
    type: oracle
    user: sample_user
    password: sample_password
    dbname: sample_database
    table: sample_table
JDBC:
    type: jdbc
    class: org.jdbc.OracleDriver
    classpath: /path/to/ojdbc5.jar (or set in CLASSPATH)
    user: scott
    password: tiger
    url: jdbc:oracle:thin:@@myhost:1521:orcl
    query: SELECT * FROM emp
Heroku PostgreSQL:

This is a special case of the JDBC driver. It requires the current PostgreSQL JDBC jar file.

    type: heroku
    classpath: /path/to/jdbc4.jar (or set in CLASSPATH)
    user: scott
    password: tiger
    host: heroku.postgres.url
    port: 1234
    dbname: herokudb 
    query: select * from emp
.file Files

You can load data that is not stored in the current project using a .file file. You must specify the path and the extension that the file would have, if it were being loaded by the standard ProjectTemplate auto-loader. An example is shown below that would load an SQLite3 database stored in a separate location:

    path: /path/to/sample_database
    extension: db

Future Support For Data Sources

It is possible to provide support for new data sources by hooking into ProjectTemplate. The ElasticSearch reader is a working example of how to achieve this. We are looking forward to linking to your custom readers for new data sources, such as SQL Server, MongoDB or CouchDB. Please use the mailing list to get in touch with us.



KentonWhite/rsangole-201-rstudio documentation built on May 24, 2019, 2:33 p.m.