orbis.save.rds: Read orbis raw data table to many .rds files

Description Usage Arguments Value

Description

Designed to be run from the direcrory where the data is.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
orbis.save.rds(txt.file.name, txt.file.dir = getwd(),
  data.codes.file = character(0), data.codes.dir = file.path(getwd(),
  "orbis-var-names"), select.codes = character(0), txt.skip.lines = 2,
  txt.nlines = NA, batch.nlines = 10^7,
  batch.file.dir = character(0), batch.file.name = character(0),
  batch.file.name.prefix = "", batch.file.name.sufix = "",
  save.rds = TRUE, return.invisible = FALSE,
  harmonize.cols = character(0), harmonize.progress.by = 10^5,
  harmonize.quite = FALSE, harmonize.procedures = list(list("toascii",
  FALSE), "remove.brackets", "toupper", "apply.nber", "remove.spaces"))

Arguments

txt.file.name

File name of raw Orbis data

txt.file.dir

A path to directory with raw data. Default is working directory.

data.codes.file

Name of the .csv file with codes. Default is the same as txt.file.name but with .csv

data.codes.dir

A path to .csv file with codes. Default is ./orbis-var-names

select.codes

A character vector with felds (code) to select. Default is all.

txt.skip.lines

The header of raw data. The default is 2.

txt.nlines

Number of lined in the raw data file. Default is calculate with grep

batch.nlines

Number of lines to read in batch. The default is 10^7

batch.file.dir

Path for saving .rds files. The default is dir same as 'txt.file.name'

harmonize.cols

Which columns to harmonize. (Requires harmonizer package.)

harmonize.progress.by

(Requires harmonizer package.) Numeric value that is used to split the org.names vector for showing percentage of completion. Default is 0 meaning not to split the vector and thus does not show progress percentage. Designed to be used for long strings.

harmonize.quite

(Requires harmonizer package.) Logical value indicating whether or not print messages about procedures progress.

harmonize.procedures

(Requires harmonizer package.) List of harmonization procedures. Each procedure can be specified as a string representing procedure name (see details for procedure names) or as a list where the first element should be procedure name (string) and other elements will passed as arguments to this procedure.

Value

A vector of .rds file names.


stasvlasov/orbisr documentation built on May 20, 2019, 1 a.m.