Description Usage Arguments difference file_structure/file_definition/file_collection adapters File types meta information
In order to read a data file, you need to create a file_definitionuration class object, which holds the path to the data file and all file file_structures needed in order to read the data file. But often multiple files share the same file structure. In this case it is useful to create a file_structure object, which only holds the file structure definitions and reuse this file_structure object for creating multiple file_definitionuration objects for the different files. For each file type there is a separate file_structure constructor:
new_file_structure_fwf()
: Create a file file_structures for
FWF files. These are data files, where the data is stored in columns
of fixed character width.
new_secification_dsv()
: Create a file file_structures for
DSV files. These are data files, where the data is stored in columns,
which are separated by a deliminator character.
new_secification_excel()
: Create a file file_structures for
EXCEL files.
new_secification_sas()
: Create a file file_structures for
SAS files.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | new_file_structure_fwf(
specification_files = NULL,
cols = NULL,
col_names = NULL,
col_types = NULL,
col_start = NULL,
col_end = NULL,
col_widths = NULL,
file_meta = NULL,
sep_width = NULL,
skip_rows = 0,
na = "",
decimal_mark = ".",
big_mark = ",",
trim_ws = TRUE,
n_max = Inf,
encoding = "latin1",
to_lower = TRUE,
adapters = new_adapters(),
...
)
new_file_structure_dsv(
specification_files = NULL,
cols = NULL,
col_names = NULL,
col_types = NULL,
file_meta = NULL,
sep = ";",
header = TRUE,
skip_rows = 0,
na = "",
decimal_mark = ".",
big_mark = ",",
trim_ws = TRUE,
n_max = Inf,
encoding = "latin1",
to_lower = TRUE,
rename_cols = FALSE,
adapters = new_adapters(),
...
)
new_file_structure_excel(
specification_files = NULL,
sheet = 1,
range = NULL,
cols = NULL,
col_names = NULL,
col_types = NULL,
file_meta = NULL,
header = TRUE,
skip_rows = 0,
na = "",
trim_ws = TRUE,
n_max = Inf,
to_lower = TRUE,
rename_cols = FALSE,
adapters = new_adapters(),
...
)
new_file_structure_sas(
specification_files = NULL,
file_meta = NULL,
skip_rows = 0,
n_max = Inf,
encoding = NULL,
to_lower = TRUE,
rename_cols = FALSE,
retype_cols = FALSE,
adapters = new_adapters(),
...
)
|
specification_files |
An optional character vector holding the paths to the files, where the file structure is described. |
cols |
An optional list argument, holding the column definitions.
This argument can be used instead of the arguments
|
col_names |
An optional character vector holding the names of the columns.
If omitted, then the strings |
col_types |
A character vector defining the data types for each column.
The following strings are allowed: |
col_start |
An optional numeric vector holding the positions of the first character
of each column.
Generally, the argument |
col_end |
An optional numeric vector holding the positions of the last character
of each column. The last vector entry (for the most right column)
is the only entry that can be |
col_widths |
An optional numeric vector holding the numbers of characters
of each column.
Generally, the argument |
file_meta |
An optional file_meta class object,
holding some meta information for each data column
(column description, possible column values + descriptions of possible
column values).
For details see section meta information.
If the argument |
sep_width |
An optional number, defining the number of characters
between each column (often |
skip_rows |
The number of rows to be skipped. In the case of DSV or
EXCEL files: If the argument |
na |
A string representing missing values in the data file. |
decimal_mark |
A character, defining the decimal separator in numeric
columns. Only the strings |
big_mark |
A character, defining the thousands separator in numeric
columns. Only the strings |
trim_ws |
A logical value, defining if the character values should be stipped of all leading and trailing white spaces. |
n_max |
A number, defining the maximum number of rows to be
read. If |
encoding |
A string, defining which encoding should be assumed when reading the data file. The following valuels are allowed:
|
to_lower |
A logical flag, defining if the names of the columns should
be transformed to lower case after reading the data set (by calling
|
adapters |
An optional list argument, holding a list of adapter functions (See section adapters). |
... |
Additional function arguments for
|
sep |
A string holding the column deliminator symbol. |
header |
A logical value, which defines if the first row contains
the data headers. If set to |
rename_cols |
A logical value, which defines if the columns given in
the data file should be overwritten by the columns given in argument
|
sheet |
A string or an integer number:
|
range |
An optional string, holding an EXCEL range string, defining the
data range in the spread sheet. If |
retype_cols |
A logical value, which defines if the types of the
columns given in SAS file changed to the types given in the
|
The goal of the package readall
is it to read data files. For this
purpose the package offers three different class objects in order to
store meta data about the data files:
file_structure class objects: Objects of this
class can be used in order to define
all file type specific information (e.g. column positions,
column names, column types, deliminator symbols, rows to skip etc.).
The idea is, that one file_structure
object may valid for several files
and therefore be used to read multiple data files.
file_definition class objects: Objects of this class type contain all informations in order to read a single specific data file (path to the data file, file file_structure etc.). A file_definition class object contains a file_structure, which holds all file type specific information, but also other informations that are only valid for this specific file.
file_collection class objects: A file_collection class object is simply a list holding multiple file_definition class objects. A file_collection class object can be used in order to read several data files at once and concatenate the data into a single data.frame.
An adapter function is a function that takes a data.frame as input argument
and returns a modified version of this data.frame.
The adapter functions are stored in an adapters
class object, which is a special list that contains all adapter functions
and a description text of each function. This class objects can be
created by using the function new_adapters()
.
The adapters class objects can be added to a
file_structure or a
file_definition or a file_collection class object.
After reading a data file (by calling read_data(file_definition))
all adapter functions listed in the adapters
argument of the
file_definition]new_file_definition()
class object
will be applied consecutively to the loaded data set.
Adapter functions can be added to an existing
file_structure or a file_definition or
a file_collection class
object by using the function add_adapters()
.
Adapter functions can be used for several tasks:
adapt the data sets in such a way that they can be concatenated for mutliple years
compute new variables from existing variables
fix errors in variables
transform the values of a variable of an older data set, such that it complies with a newer variable definition
The function read_data()
can read read four different types of data
FWF
: Fixed width files. This files are text files, where the data is
stored in columns, that have a fixed character width.
DSV
: Delimiter-separated value file. This files are text files, where
the data is stored in columns that are separated by a delimiter character.
EXCEL
: An excel file holding the data.
SAS
: A SAS file holding the data.
In order to read a data file with the function read_data()
,
it is useful to create a file_definitionuration or
file_structure class object,
holding all needed data file file_structures:
new_file_definition_fwf()
or new_file_structure_fwf()
for FWF
files
new_file_definition_dsv()
or new_file_structure_dsv()
for DSV
files
new_file_definition_excel()
or new_file_structure_excel()
for Excel
files
new_file_definition_sas()
or new_file_structure_sas()
for SAS
files
The col_meta class objects are used in order to store some
meta information about single data columns, like additional column desciptions,
and column value/level descriptions. In order to store meta information
about a set of columns a file_meta class object can be
used. This objects store a list of col_meta class objects, where
each col_meta class object corresponds to a specific column in
a data set. This file_meta class objects are usually
stored in file_structure class objects or
file_definition class objects. But when calling read_data()
, the
meta information gets also appended to the resulting data.frame
.
The meta information stored in a file_structure,
a file_definition class object or a read data.frame
can be extracted
by using the function get_meta()
.
A col_meta class object holds the following informations:
desc
: A string holding the column description.
values
: A vector (character/logical/numeric) usually holding
the possible column values (e.g. c(1, 2)
) or a more abstract text
version of the column values (e.g. c("JJJJMMDD", "99999999", "")
).
values_desc
: A character vector that corresponds to the values
vector.
Each entry of values_desc
is a more detailed description of the
corresponding entry in values
. If some descriptions are not present,
the entries are NA
values.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.