Description Usage Arguments Value File types difference file_structure/file_definition/file_collection adapters meta information See Also
View source: R/file_definition.R
In order to read a data file with read_data()
,
you need to create a new file_definitionuration
object.
The following functions are available:
new_file_definition()
: Can create a file_definitionuration
object for FWF, DSV, EXCEL or
SAS data files, depending on the supported file type of the
file_structure class object.
new_file_definition_fwf()
: Can create a file_definitionuration
object for FWF files.
new_file_definition_dsv()
: Can create a file_definitionuration
object for DSV files.
new_file_definition_excel()
: Can create a file_definitionuration
object for EXCEL files.
new_file_definition_sas()
: Can create a file_definitionuration
object for SAS files.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | new_file_definition(
file_path,
file_structure,
to_lower = NULL,
cols_keep = TRUE,
extra_col_name = NULL,
extra_col_val = NULL,
extra_col_file_path = FALSE,
extra_adapters = new_adapters()
)
new_file_definition_fwf(
file_path,
specification_files = NULL,
cols = NULL,
col_names = NULL,
col_types = NULL,
col_start = NULL,
col_end = NULL,
col_widths = NULL,
file_meta = NULL,
sep_width = NULL,
skip_rows = 0,
na = "",
decimal_mark = ".",
big_mark = ",",
trim_ws = TRUE,
n_max = Inf,
encoding = "latin1",
to_lower = TRUE,
adapters = new_adapters(),
cols_keep = TRUE,
extra_col_name = NULL,
extra_col_val = NULL,
extra_col_file_path = FALSE,
...
)
new_file_definition_dsv(
file_path,
specification_files = NULL,
cols = NULL,
col_names = NULL,
col_types = NULL,
file_meta = NULL,
sep = ";",
header = TRUE,
skip_rows = 0,
na = "",
decimal_mark = ".",
big_mark = ",",
trim_ws = TRUE,
n_max = Inf,
encoding = "latin1",
to_lower = TRUE,
rename_cols = FALSE,
adapters = new_adapters(),
cols_keep = TRUE,
extra_col_name = NULL,
extra_col_val = NULL,
extra_col_file_path = FALSE,
...
)
new_file_definition_excel(
file_path,
specification_files = NULL,
sheet = 1,
range = NULL,
cols = NULL,
col_names = NULL,
col_types = NULL,
file_meta = NULL,
header = TRUE,
skip_rows = 0,
na = "",
trim_ws = TRUE,
n_max = Inf,
to_lower = TRUE,
rename_cols = FALSE,
adapters = new_adapters(),
cols_keep = TRUE,
extra_col_name = NULL,
extra_col_val = NULL,
extra_col_file_path = FALSE,
...
)
new_file_definition_sas(
file_path,
specification_files = NULL,
file_meta = NULL,
skip_rows = 0,
n_max = Inf,
encoding = NULL,
to_lower = TRUE,
rename_cols = FALSE,
retype_cols = FALSE,
adapters = new_adapters(),
cols_keep = TRUE,
extra_col_name = NULL,
extra_col_val = NULL,
extra_col_file_path = FALSE,
...
)
|
file_path |
A string holding the path to the data file. |
file_structure |
A file_structure class object.
This type of objects can be created by the functions
|
to_lower |
A logical flag, defining if the names of the columns should
be transformed to lower case after reading the data set (by calling
|
cols_keep |
Either |
extra_col_name |
An optional string, which defines the column, which
will be added to the data set (after reading it with function |
extra_col_val |
An optional value (any atomic type), which will be added
(after reading the data set with function |
extra_col_file_path |
Either |
extra_adapters |
An optional adapters class object, which holds a list of adapter functions. These adapter functions will be added to the adapter functions already stored in the file_structure class object. For further details on adapter functions see section adapters. |
specification_files |
An optional character vector holding the paths to the files, where the file structure is described. |
cols |
An optional list argument, holding the column definitions.
This argument can be used instead of the arguments
|
col_names |
An optional character vector holding the names of the columns.
If omitted, then the strings |
col_types |
A character vector defining the data types for each column.
The following strings are allowed: |
col_start |
An optional numeric vector holding the positions of the first character
of each column.
Generally, the argument |
col_end |
An optional numeric vector holding the positions of the last character
of each column. The last vector entry (for the most right column)
is the only entry that can be |
col_widths |
An optional numeric vector holding the numbers of characters
of each column.
Generally, the argument |
file_meta |
An optional file_meta class object,
holding some meta information for each data column
(column description, possible column values + descriptions of possible
column values).
For details see section meta information.
If the argument |
sep_width |
An optional number, defining the number of characters
between each column (often |
skip_rows |
The number of rows to be skipped. In the case of DSV or
EXCEL files: If the argument |
na |
A string representing missing values in the data file. |
decimal_mark |
A character, defining the decimal separator in numeric
columns. Only the strings |
big_mark |
A character, defining the thousands separator in numeric
columns. Only the strings |
trim_ws |
A logical value, defining if the character values should be stipped of all leading and trailing white spaces. |
n_max |
A number, defining the maximum number of rows to be
read. If |
encoding |
A string, defining which encoding should be assumed when reading the data file. The following valuels are allowed:
|
adapters |
An optional list argument, holding a list of adapter functions (See section adapters). |
... |
Additional function arguments for
|
sep |
A string holding the column deliminator symbol. |
header |
A logical value, which defines if the first row contains
the data headers. If set to |
rename_cols |
A logical value, which defines if the columns given in
the data file should be overwritten by the columns given in argument
|
sheet |
A string or an integer number:
|
range |
An optional string, holding an EXCEL range string, defining the
data range in the spread sheet. If |
retype_cols |
A logical value, which defines if the types of the
columns given in SAS file changed to the types given in the
|
An file_definition
class object holding all information needed for
reading the data file with read_data()
.
The function read_data()
can read read four different types of data
FWF
: Fixed width files. This files are text files, where the data is
stored in columns, that have a fixed character width.
DSV
: Delimiter-separated value file. This files are text files, where
the data is stored in columns that are separated by a delimiter character.
EXCEL
: An excel file holding the data.
SAS
: A SAS file holding the data.
In order to read a data file with the function read_data()
,
it is useful to create a file_definitionuration or
file_structure class object,
holding all needed data file file_structures:
new_file_definition_fwf()
or new_file_structure_fwf()
for FWF
files
new_file_definition_dsv()
or new_file_structure_dsv()
for DSV
files
new_file_definition_excel()
or new_file_structure_excel()
for Excel
files
new_file_definition_sas()
or new_file_structure_sas()
for SAS
files
The goal of the package readall
is it to read data files. For this
purpose the package offers three different class objects in order to
store meta data about the data files:
file_structure class objects: Objects of this
class can be used in order to define
all file type specific information (e.g. column positions,
column names, column types, deliminator symbols, rows to skip etc.).
The idea is, that one file_structure
object may valid for several files
and therefore be used to read multiple data files.
file_definition class objects: Objects of this class type contain all informations in order to read a single specific data file (path to the data file, file file_structure etc.). A file_definition class object contains a file_structure, which holds all file type specific information, but also other informations that are only valid for this specific file.
file_collection class objects: A file_collection class object is simply a list holding multiple file_definition class objects. A file_collection class object can be used in order to read several data files at once and concatenate the data into a single data.frame.
An adapter function is a function that takes a data.frame as input argument
and returns a modified version of this data.frame.
The adapter functions are stored in an adapters
class object, which is a special list that contains all adapter functions
and a description text of each function. This class objects can be
created by using the function new_adapters()
.
The adapters class objects can be added to a
file_structure or a
file_definition or a file_collection class object.
After reading a data file (by calling read_data(file_definition))
all adapter functions listed in the adapters
argument of the
file_definition]new_file_definition()
class object
will be applied consecutively to the loaded data set.
Adapter functions can be added to an existing
file_structure or a file_definition or
a file_collection class
object by using the function add_adapters()
.
Adapter functions can be used for several tasks:
adapt the data sets in such a way that they can be concatenated for mutliple years
compute new variables from existing variables
fix errors in variables
transform the values of a variable of an older data set, such that it complies with a newer variable definition
The col_meta class objects are used in order to store some
meta information about single data columns, like additional column desciptions,
and column value/level descriptions. In order to store meta information
about a set of columns a file_meta class object can be
used. This objects store a list of col_meta class objects, where
each col_meta class object corresponds to a specific column in
a data set. This file_meta class objects are usually
stored in file_structure class objects or
file_definition class objects. But when calling read_data()
, the
meta information gets also appended to the resulting data.frame
.
The meta information stored in a file_structure,
a file_definition class object or a read data.frame
can be extracted
by using the function get_meta()
.
A col_meta class object holds the following informations:
desc
: A string holding the column description.
values
: A vector (character/logical/numeric) usually holding
the possible column values (e.g. c(1, 2)
) or a more abstract text
version of the column values (e.g. c("JJJJMMDD", "99999999", "")
).
values_desc
: A character vector that corresponds to the values
vector.
Each entry of values_desc
is a more detailed description of the
corresponding entry in values
. If some descriptions are not present,
the entries are NA
values.
read_data()
, get_col_names()
, get_col_types()
, get_file_type()
add_adapters()
, get_encoding()
, apply_adapters()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.