Description Usage Arguments Details Value File types difference file_structure/file_definition/file_collection adapters
The functions read_data()
, read_data_fwf()
, read_data_dsv()
and
read_data_excel()
are all used in order to read FWF, DSV or EXCEL data files.
The function read_data()
is the heart of the readall
package and it
only requires the user to pass a single function argument
(a file_definition class object), holding all
needed file information in order to read the data file. By instead passing a
file_collection class object into read_data()
, it is
also possible to read multiple data files at once and store the concatenated
data sets into a single data.frame.
The functions read_data_fwf()
, read_data_dsv()
and read_data_excel()
are less flexible, but have a more common structure, since this functions
do not use file_definition class objects, but require the user
to pass in all file information directly as function arguments.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | read_data(file_definition)
read_data_fwf(
file_path,
specification_files = NULL,
cols = NULL,
col_names = NULL,
col_types = NULL,
col_start = NULL,
col_end = NULL,
col_widths = NULL,
sep_width = 0,
skip_rows = 0,
na = "",
decimal_mark = ".",
big_mark = ",",
trim_ws = TRUE,
n_max = Inf,
encoding = "latin1",
to_lower = TRUE,
adapters = new_adapters(),
cols_keep = TRUE,
extra_col_name = NULL,
extra_col_val = NULL,
extra_col_file_path = FALSE,
...
)
read_data_dsv(
file_path,
specification_files = NULL,
cols = NULL,
col_names = NULL,
col_types = NULL,
sep = ";",
header = TRUE,
skip_rows = 0,
na = "",
decimal_mark = ".",
big_mark = ",",
trim_ws = TRUE,
n_max = Inf,
encoding = "latin1",
to_lower = TRUE,
rename_cols = FALSE,
adapters = new_adapters(),
cols_keep = TRUE,
extra_col_name = NULL,
extra_col_val = NULL,
extra_col_file_path = FALSE,
...
)
read_data_excel(
file_path,
specification_files = NULL,
range = NULL,
sheet = NULL,
cols = NULL,
col_names = NULL,
col_types = NULL,
header = TRUE,
skip_rows = 0,
na = "",
trim_ws = TRUE,
n_max = Inf,
to_lower = TRUE,
rename_cols = FALSE,
adapters = new_adapters(),
cols_keep = TRUE,
extra_col_name = NULL,
extra_col_val = NULL,
extra_col_file_path = FALSE,
...
)
read_data_sas(
file_path,
specification_files = NULL,
skip_rows = 0,
n_max = Inf,
encoding = NULL,
to_lower = TRUE,
rename_cols = FALSE,
retype_cols = FALSE,
adapters = new_adapters(),
cols_keep = TRUE,
extra_col_name = NULL,
extra_col_val = NULL,
extra_col_file_path = FALSE,
...
)
|
file_definition |
A file_definitionuration object, holds all informations needed for reading the data. This object can be created with one of the following functions:
|
file_path |
A string holding the path to the data file. |
specification_files |
An optional character vector holding the paths to the files, where the file structure is described. |
cols |
An optional list argument, holding the column definitions.
This argument can be used instead of the arguments
|
col_names |
An optional character vector holding the names of the columns.
If omitted, then the strings |
col_types |
A character vector defining the data types for each column.
The following strings are allowed: |
col_start |
An optional numeric vector holding the positions of the first character
of each column.
Generally, the argument |
col_end |
An optional numeric vector holding the positions of the last character
of each column. The last vector entry (for the most right column)
is the only entry that can be |
col_widths |
An optional numeric vector holding the numbers of characters
of each column.
Generally, the argument |
sep_width |
An optional number, defining the number of characters
between each column (often |
skip_rows |
The number of rows to be skipped. In the case of DSV or
EXCEL files: If the argument |
na |
A string representing missing values in the data file. |
decimal_mark |
A character, defining the decimal separator in numeric
columns. Only the strings |
big_mark |
A character, defining the thousands separator in numeric
columns. Only the strings |
trim_ws |
A logical value, defining if the character values should be stipped of all leading and trailing white spaces. |
n_max |
A number, defining the maximum number of rows to be
read. If |
encoding |
A string, defining which encoding should be assumed when reading the data file. The following valuels are allowed:
|
to_lower |
A logical flag, defining if the names of the columns should
be transformed to lower case after reading the data set (by calling
|
adapters |
An optional list argument, holding a list of adapter functions (See section adapters). |
cols_keep |
Either |
extra_col_name |
An optional string, which defines the column, which
will be added to the data set (after reading it with function |
extra_col_val |
An optional value (any atomic type), which will be added
(after reading the data set with function |
extra_col_file_path |
Either |
... |
Additional function arguments for
|
sep |
A string holding the column deliminator symbol. |
header |
A logical value, which defines if the first row contains
the data headers. If set to |
rename_cols |
A logical value, which defines if the columns given in
the data file should be overwritten by the columns given in argument
|
range |
An optional string, holding an EXCEL range string, defining the
data range in the spread sheet. If |
sheet |
A string or an integer number:
|
retype_cols |
A logical value, which defines if the types of the
columns given in SAS file changed to the types given in the
|
The function read_data()
can either read a single data file and
return a data.frame or it can read multiple data files at once and return
the concatenated data sets as a single data.frame.
read_data()
can read the following data file types:
FWF
: Fixed width files. This files are text files, where the data is
stored in columns, that have a fixed character width.
DSV
: Delimiter-separated value file. This files are text files, where
the data is stored in columns that are separated by a delimiter character.
EXCEL
: An excel file holding the data.
In order to read a single file with read_data()
a file_definition class object must be
passed into the function argument file_definition
.
This file_definition class objects
contain all information needed for reading a specific data file.
When calling read_data(file_definition)
where file_definition
is a file_definition class
object, the following tasks will be executed:
reading the data file specified in file_definition
and storing the data to a data.frame
if the argument to_lower
was set to TRUE
, then replace all column
names of the read data set by its lower case version.
if the column names where read from the data file and the column names
are given by the col_names
argument, then compare the read column
names with the column names given in col_names
and print a warning in
case of discrepancies.
in the case of SAS-files: If the argument col_types
was given as well,
then compare the read data types of the data columns with the
data types given in col_types
and print a warning in
case of discrepancies.
modifying the resulting data.frame by consecutively applying all adapter functions
stored in the adapter function list argument file_definition$adapters
.
For details see section adapters
Optionally adding a column with value file_definition$extra_col_val
and column name
file_definition$extra_col_name
. For details see new_file_definition()
Optionally adding a character column holding the path of the read data file
with column name defined in file_definition$extra_col_file_path
.
If file_definition$cols_keep
is not NULL
, then only the columns defined in
file_definition$cols_keep
will be kept. If the attribute is NULL
then
all columns will be kept.
Finally the resulting data.frame will be returned.
In order to read multiple data files at once and automatically concatenate
the resulting data.frames into a single data.frame, you need to create a
list of [file_definition][new_file_definition()]
class objects first by using the function
new_file_collection()
.
Each list entry holds the meta data of a different data file.
When read_data()
is applied on a file_collection class
object, then the following tasks will be executed:
loop through the list apply read_data()
on every list
entry. Since these entries are file_definition class objects, the
tasks of reading single data files (as described above) will be executed
for each list entry.
concatenate all resulting data.frames into a single data.frame.
A data.frame holding the read data.
The function read_data()
can read read four different types of data
FWF
: Fixed width files. This files are text files, where the data is
stored in columns, that have a fixed character width.
DSV
: Delimiter-separated value file. This files are text files, where
the data is stored in columns that are separated by a delimiter character.
EXCEL
: An excel file holding the data.
SAS
: A SAS file holding the data.
In order to read a data file with the function read_data()
,
it is useful to create a file_definitionuration or
file_structure class object,
holding all needed data file file_structures:
new_file_definition_fwf()
or new_file_structure_fwf()
for FWF
files
new_file_definition_dsv()
or new_file_structure_dsv()
for DSV
files
new_file_definition_excel()
or new_file_structure_excel()
for Excel
files
new_file_definition_sas()
or new_file_structure_sas()
for SAS
files
The goal of the package readall
is it to read data files. For this
purpose the package offers three different class objects in order to
store meta data about the data files:
file_structure class objects: Objects of this
class can be used in order to define
all file type specific information (e.g. column positions,
column names, column types, deliminator symbols, rows to skip etc.).
The idea is, that one file_structure
object may valid for several files
and therefore be used to read multiple data files.
file_definition class objects: Objects of this class type contain all informations in order to read a single specific data file (path to the data file, file file_structure etc.). A file_definition class object contains a file_structure, which holds all file type specific information, but also other informations that are only valid for this specific file.
file_collection class objects: A file_collection class object is simply a list holding multiple file_definition class objects. A file_collection class object can be used in order to read several data files at once and concatenate the data into a single data.frame.
An adapter function is a function that takes a data.frame as input argument
and returns a modified version of this data.frame.
The adapter functions are stored in an adapters
class object, which is a special list that contains all adapter functions
and a description text of each function. This class objects can be
created by using the function new_adapters()
.
The adapters class objects can be added to a
file_structure or a
file_definition or a file_collection class object.
After reading a data file (by calling read_data(file_definition))
all adapter functions listed in the adapters
argument of the
file_definition]new_file_definition()
class object
will be applied consecutively to the loaded data set.
Adapter functions can be added to an existing
file_structure or a file_definition or
a file_collection class
object by using the function add_adapters()
.
Adapter functions can be used for several tasks:
adapt the data sets in such a way that they can be concatenated for mutliple years
compute new variables from existing variables
fix errors in variables
transform the values of a variable of an older data set, such that it complies with a newer variable definition
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.