new_file_structure_sas_: Helper function for 'new_file_structure_sas()'

Description Usage Arguments difference file_structure/file_definition/file_collection adapters File types meta information

View source: R/file_structure.R

Description

In order to read a data file, you need to create a file_definitionuration class object, which holds the path to the data file and all file file_structures needed in order to read the data file. But often multiple files share the same file structure. In this case it is useful to create a file_structure object, which only holds the file structure definitions and reuse this file_structure object for creating multiple file_definitionuration objects for the different files. For each file type there is a separate file_structure constructor:

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
new_file_structure_sas_(
  specification_files = NULL,
  file_meta = NULL,
  skip_rows,
  n_max,
  encoding = NULL,
  to_lower,
  rename_cols,
  retype_cols,
  adapters = new_adapters(),
  err_h,
  class = NULL,
  ...
)

Arguments

specification_files

An optional character vector holding the paths to the files, where the file structure is described.

file_meta

An optional file_meta class object, holding some meta information for each data column (column description, possible column values + descriptions of possible column values). For details see section meta information. If the argument cols is not NULL, then the argument file_meta must be omitted.

skip_rows

The number of rows to be skipped. In the case of DSV or EXCEL files: If the argument header is set to TRUE, then the first row is always assumed to be the header row.

n_max

A number, defining the maximum number of rows to be read. If n_max = Inf, then all available rows will be read.

encoding

A string, defining which encoding should be assumed when reading the data file. The following valuels are allowed:

  • "UTF-8": For UTF-8 encoded files.

  • "latin1": For ISO 8859-1 (also called Latin-1) encoded files. This encoding is almost the same as Windows-1252 (also called ANSI). They differ only in 32 symbol codes (special symbols that are rarely used). In the case of SAS files, it is possible to set encoding = NULL. In this case, the encoding defined in the SAS data file header will be used.

to_lower

A logical flag, defining if the names of the columns should be transformed to lower case after reading the data set (by calling read_data()). This transformation will be applied before comparing the column names (in the case of SAS-Files or DSV- and EXCE-Files with header = TRUE).

rename_cols

A logical value, which defines if the columns given in the data file should be overwritten by the columns given in argument col_names. If col_names is not given, then rename_cols has no effect.

retype_cols

A logical value, which defines if the types of the columns given in SAS file changed to the types given in the col_types argument. If col_types is not given, then retype_cols has no effect.

adapters

An optional list argument, holding a list of adapter functions (See section adapters).

err_h

An error handler

class

A character vector holding one of the following class names:

  • "file_structure_fwf" for FWF file file_structures

  • "file_structure_dsv" for DSV file file_structures

  • "file_structure_excel" for EXCEL file file_structures

...

Additional function arguments for

  • readr::read_fwf() in case of FWF files

  • utils::read.delim() in case of DSV files

  • readxl::read_excel() in case of EXCEL files

difference file_structure/file_definition/file_collection

The goal of the package readall is it to read data files. For this purpose the package offers three different class objects in order to store meta data about the data files:

adapters

An adapter function is a function that takes a data.frame as input argument and returns a modified version of this data.frame. The adapter functions are stored in an adapters class object, which is a special list that contains all adapter functions and a description text of each function. This class objects can be created by using the function new_adapters(). The adapters class objects can be added to a file_structure or a file_definition or a file_collection class object. After reading a data file (by calling read_data(file_definition)) all adapter functions listed in the adapters argument of the file_definition]new_file_definition() class object will be applied consecutively to the loaded data set. Adapter functions can be added to an existing file_structure or a file_definition or a file_collection class object by using the function add_adapters(). Adapter functions can be used for several tasks:

File types

The function read_data() can read read four different types of data

meta information

The col_meta class objects are used in order to store some meta information about single data columns, like additional column desciptions, and column value/level descriptions. In order to store meta information about a set of columns a file_meta class object can be used. This objects store a list of col_meta class objects, where each col_meta class object corresponds to a specific column in a data set. This file_meta class objects are usually stored in file_structure class objects or file_definition class objects. But when calling read_data(), the meta information gets also appended to the resulting data.frame. The meta information stored in a file_structure, a file_definition class object or a read data.frame can be extracted by using the function get_meta(). A col_meta class object holds the following informations:


a-maldet/readall documentation built on Dec. 18, 2021, 9:23 p.m.