simple_version_control: Title Compare two versions of data with the same length and...

View source: R/simple_vc.R

simple_version_controlR Documentation

Title Compare two versions of data with the same length and mark new data in a version control column

Description

Title Compare two versions of data with the same length and mark new data in a version control column

Usage

simple_version_control(
  dt,
  id,
  oldcol = NULL,
  newcol,
  type = "list",
  out = NULL,
  vccol,
  olddate = NULL,
  newdate = Sys.Date()
)

Arguments

dt

Input data

id

Unique ID column

oldcol

Column of historic data

newcol

Column of updated data

type

one of "list" or "flat"

out

one of "table" or "vector" or NULL. NULL returns table and writes to rds file

vccol

Optional; name of existing version control column or name to assign to new one. If missing default uses common part of oldcol and newcol strings followed by _VC or uses oldcol if no common string.

olddate

Optional; date of previous data. Default is "original".

newdate

Optional; date of new data. Default is system date.

Value

returns vccol newdate with new changes; records 1st record per ID which is not NA then adds any changes to the oldcol values. In list format this is by adding new data to new rows for each ID and for flat format data in added to vccol in format olddate;oldcol and any changes are added as newdate;newcol

Examples

# dt <- data.table(c(1:6),
#             c("A", "B", "C", NA, "D", ""),
#             c("", "Test", NA, NA, "C", NA))
# dt$VC <- simple_version_control(dt,
#   oldcol = "V2", newcol = "V3", id = "V1",
#   olddate = "old", newdate = "new",
#   type = "flat",
#   out = "vector")
#
# dt$VC <- simple_version_control(dt,
#    oldcol = "V2", newcol = "V3", id = "V1",
#    olddate = "old", newdate = "new",
#    type = "list",
#    out = "vector")
#
# Process by chunk
# ds <- split(dt, (as.numeric(rownames(dt))-1) %/% 10000000)
# for (s in 1:length(ds)){
# ds[[s]][, variable_VC :=
#         simple_version_control(dt = ds[[s]],
#                id = "key",
#                oldcol = "valueprev",
#                newcol = "value",
#                olddate = "20220503",
#                newdate = "20220510",
#                type = "flat",
#                out = "vector",
#                vccol = "variable_VC")]
# }
# ds <- rbindlist(ds, use.names = T, fill = T)


DHatziioanou/simpleepi documentation built on Sept. 24, 2024, 5:25 a.m.