news.md
In madedotcom/retl: Support for common ETL operations and connections

RETL Package Updates

bqDefaultLocation - new function created to load value from environment variable BIGQUERY_LOCATION
bqCreateDataset - Added location as argument, defaults to bqDefaultLocation()

gsAuth() - Upgraded to use googlesheet4 instead of googlesheet.
gsLoadSheet() - Upgraded function to use googlesheet4 instead of googlesheet. The parameters verbose, lookup and visibility are removed as not available/required in googlesheet4.
gsLoadAll() - Upgraded function to use googlesheet4 instead of googlesheet. The parameters verbose, lookup and visibility are removed as not available/required in googlesheet4.
Created test.googlesheet.R to test googlesheet functions based on a Google Sheet shared with the encrypted token.

gsLoadAll() - Added Sys.sleep(6) before each call

bqCreateTable() - Now keeps field descriptions on WRITE_TRUNCATE desponsition through a workaround. See: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job

schemaUpdateOptions[]: For normal tables, WRITE_TRUNCATE will always overwrite the schema.

bqInsertLargeData() - lock json output for POSIXt to ISO8601 format.

GCS_DEFAULT_BUCKET envar changed to previous GCS_BUCKET as it is already being used.

This version has minor bug fixes and refactoring within functions.

bqCreateTable() refactor the way the function works to separate parameterised query behavior between legacy and standard sql. Legacy sql does not take any parameters anymore and query template should be pre-processed before the function call.
bqInsertLargeData() - is refactored to load data through temporary file in GCP.
bqCreateTable() - now if field description in the schema file is updated, table will be patched accordingly.

bqAlterTable() - allows to set table options.
bqUpdateTableDescription() - updates description option on the table.

with_mock_bigquery() - test execution wrapper allows you to call bqExecuteQuery() in tests seemlesly while targeting stub data in a give query file.

bqImportData() - revive the function by removing references to S3 default root.

bqCreateDataset() - writes warning if data set already exists instead of throwing and error.

dcWriteCustomMetrics() - added revenue and currency.code as arguments
dcPredictionBody() - added revenue and currency.code as arguments
dcListConversions() - added revenue and currency.code as arguments

bqCopyDatasetSchema() - copies all tables from a given dataset to another one using metada.
bqCopyTableSchema() - creates new empty table using metadata of existing table.

bqInitiateTable() - accepts partitioning parameter that allows to partition table with a list of field names.

bqAssertUnique() - New function, throws exception if duplicates are found on primary key
bqCountDuplicates() - New function, returns count of duplicate rows when grouped by key(s).

gsLoadSheet() - add verbose, lookup and visibility params which were taking default values and was causing issues when loading private sheet.
gsLoadAll() - add verbose, lookup and visibility params to be passed to gsLoadSheet().

s3GetFile.zip() - add fread.fill param to function to prevent R session error on large files

dcListConversions() - use env variables DOUBLECLICK_SEGMENTATION_TYPE and DOUBLECLICK_SEGMENTATION_NAME to allow to push metrics to new Doubleclick activities

bqInsertData() - add initiation capability via argument schema.file
bqInsertLargeData() - add initiation capability via argument schema.file

bqCreateTable() - add initiation capability via argument schema.file

bqTransformPartition() - made missing.dates parameter of the function

bqDownloadQuery() - allows to load data from BigQuery via Storage.

readSqlGlue() - new function to pass variables passed in ellipsis into a text of the file

bqCreatePartitionTable() - change missing.dates parameter logic

bqExecuteDml() - takes parameters

bqExecuteSql() - #127 is resolved, now you can pass parameters to query with explicit bq_param_array class (@byapparov, #129)

dcWriteCustomMetics - add the correction for the binary read

sqlRangeLabel(), sqlRangeIndex() - function allow to create CASE statements from limits vector that defines ranges (@byapparov, #122)

Switched to versions of bigrquery above 1.2.0.
Fixed test bqExecuteDml()
Added gargle for access token encryption which allows to do full bigquery test on Travis, see how to manage tokens securely article.

bqExecuteDml() - added support for DML statement execution that can be run with different priority without loading data to the server.

bqPatchTable() - now uses field name and type for matching.

bqExecuteQuery() and bqExecuteFile() - fixed to allow parameterised queries with vector values in params.

bqInitiateTable() - Will fail if schema file is missing fields compared to the target table.
bqPatchTable() - Function that allows to update table fields using the schema file.

bqCreatePartitionTable() - added use.legacy.sql parameter to simplify control for sql type.

bqInsertLargeData() - new function to split large data into 'chunks' which is then inserted into the Big Query table iteratively.

s3GetData() renamed s3Get.FUN to s3.get.fun to comply with coding style.
Style changes and .lintr added. Lint checks added to the testthat to make code validation part of CI.

bqTransformPartition(), bqRefreshPartitionData() - added parameter to control sql dialect of BigQuery.
bqCreatePartitionTable() - updated to create partition from several shard tables with one combined query. This is done to reduce the number of changes against the target table to meet the limit of 5000 changes per day.

bqCreateTable()
you can switch between SQL dialects. Parameters are not available yet.
write_disposition argument was renamed to write.disposition.
bqExecuteQuery(), bqExecuteSql(), bqExecuteFile() - (#93, @byapparov)
Resulting column names in data.table conformed to have words separated by dot: my.field.name;
use.legacy.sql argument is available to switch between SQL dialects in BigQuery.
named arguments to these functions will be turned to query params if standard dialect is used.
getExistingDates() - depricated and removed in favour of bqExistingPartitionDates();
bqGetData() - depricated in favour of bqExecuteQuery() and bqExecuteFile();
bqGetColumnNames() - depricated, could not find it to be used. lower level calls are depricated also.

bqImportData() - allows to import GS file into BigQuery table. By default imports mirror file from table-name.csv.gz. Format and compression params control file extension.
bqExtractTable() - allows to save table to GS file, you only need to specify table name and format, everything else will be mapped automatically.
getExistingPartitionDates() - replaced by bqExistingPartitionDates().
gaGetShop() - is removed from the package. map of datasets should be created externaly.
bqInsertData() - lost job.name and increment.field parameters as all etl logging fucntions moved to rmeta.
All functions related to metadata logging and dependant on InfluxDb were moved to rmeta package.
gdLoadReport() - function is moved to rGoodData package.
bqRefreshPartitionData(), bqTransformPartition() - priority parameter added to the functions (#86).
createRangeTable() this function was fully replaced by bqCreatePartitionTable() and bqTransformPartition() functions (#86).
bqCreatePartitionTable() - added priority parameter that allows to execute biquery jobs in BATCH mode.
dcPredictionBody() - function is vectorised, which means that it can turn multiple transactions into a single body request. This is a breaking change as custom.metrics param is now a list of vectors.

s3GetFile(), s3PutFile() - functions are changed to read data based on the extention of the file. Fore example you can use it instead of calling s3GetFile.csv() if path ends with .csv.
bqInsertData() - added fields parameter to force BigQuery types to the given types
disaggregate() - new function to split data.table from aggregated to individual lines
bqSaveSchema(), bqExtractSchema() - allow to save schema from a given data set into a JSON file
bqDeleteDataset() - before deletion presense of delete:never label key-value pair is checked, which will protect datasets from programmatic deletion. (#79)
bqProjectDatasets() - lists datasets in the project. (#79)
bqProjectTables() - extracts metadata for all tables in the project by extracting __TABLES__ for each dataset. (#79)
bqUseLegacySql() - allows to check or set flavour of bigquery sql. (#79)
s3ListFiles() - gets metadata of s3 files matching give path into data.table (#80)

Lower level BigQuery API calls are updated to the new functions from bigrquery 1.0.0
bqDeleteDataset() deletes dataset (#71)
bqCreateDataset() creates dataset (#71)
bqTableSchema() loads table schema bq_fields object

influxLog(), influxConnection() changed the API of the influx wrapper functions to default values. (#64, @byapparov)
bqRefreshPartitionData() new function to allow batch updates of the partitioned table data. (#66, @byapparov)

madedotcom/retl documentation built on Sept. 19, 2022, 3:28 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

madedotcom/retl Support for common ETL operations and connections

news.md In madedotcom/retl: Support for common ETL operations and connections

RETL Package Updates

0.1.42

0.1.41

0.1.40

0.1.39

0.1.38

0.1.37

0.1.36

0.1.35

0.1.34

0.1.33

0.1.32

0.1.31

0.1.30

0.1.29

0.1.28

0.1.27

0.1.26

0.1.25

0.1.24

0.1.23

0.1.22

0.1.21

0.1.20

0.1.19

0.1.18

0.1.17

0.1.16

0.1.15

0.1.14

0.1.13

0.1.12

0.1.11

0.1.10

0.1.9

0.1.8

0.1.7

0.1.5

0.1.4

0.1.3

0.1.2

0.1.1

0.1.0

R Package Documentation

Browse R Packages

We want your feedback!

madedotcom/retl
Support for common ETL operations and connections

news.md
In madedotcom/retl: Support for common ETL operations and connections