copy_into_f: Copy Data from the Data Lake to the Data Warehouse

View source: R/copy_into.R

copy_into_fR Documentation

Copy Data from the Data Lake to the Data Warehouse

Description

This function copies data from the data lake to the data warehouse.

Usage

copy_into_f(
  conn,
  server = NULL,
  config = NULL,
  config_url = NULL,
  config_file = NULL,
  to_schema = NULL,
  to_table = NULL,
  db_name = NULL,
  dl_path = NULL,
  file_type = c("csv", "parquet", "orc"),
  identity = NULL,
  secret = NULL,
  max_errors = 100,
  compression = c("none", "gzip", "defaultcodec", "snappy"),
  field_quote = "",
  field_term = NULL,
  row_term = NULL,
  first_row = 2,
  overwrite = T,
  rodbc = F,
  rodbc_dsn = "int_edw_16"
)

Arguments

conn

SQL server connection created using odbc package

server

server name, i.e., 'hhsaw' or 'phclaims'

config

A object in memory with the YAML config file contents (should be blank if using config_url or config_file)

config_url

The URL location of the YAML config file (should be blank if using config or config_file).

config_file

The path and file name of the YAML config file (should be blank if using config or config_url).

to_schema

schema name

to_table

table name

db_name

database name, e.g., "hhs_analytics_workspace", "inthealth_edw", etc.

dl_path

The path to the data lake where the source files are located.

file_type

file type, i.e., "csv", "parquet", or "orc".

identity

The identity (username or account name) used for authentication when accessing the data lake.

secret

The secret key or password associated with the identity for authentication.

max_errors

The total number of records that can be rejected before entire file will be rejected by the system.

compression

compression used, i.e., "none", "gzip", "defaultcodec", or "snappy".

field_quote

The character used to quote fields in the input file (e.g., double quotes). Default is field_quote = ""

field_term

The character or string used to separate fields in the input file (e.g., comma for CSV).

row_term

The character or string used to separate rows in the input file (e.g., newline character).

first_row

The row number where data begins in the input file (excluding headers if present).

overwrite

Logical; if TRUE, truncate the table first before creating it, if it exists (default is TRUE).

rodbc

Logical; if TRUE, use the RODBC package to run the query (avoids encoding error if using a secret key).

rodbc_dsn

The DSN name of the RODBC connection to use with RODBC (only need to set if not using prod server).

Details

Plans for future improvements: - Add warning when table is about to be overwritten. - Add other options for things we're not using (e.g., file_format).

Value

None

Note

Plans future improvement:

  • Add warning when table is about to be overwritten

  • Add in other options for things we're not using (e.g., file_format)

Author(s)

Alastair Matheson, 2019-04-04

Examples

 ## Not run: 
  # ENTER EXAMPLES HERE
 
## End(Not run)


PHSKC-APDE/apde documentation built on April 14, 2025, 10:46 a.m.