Description Usage Arguments Details
Read
reads a dataset from a plain text file.
1 2 3 |
files |
A character vector of the files to be read. Paths can either be absolute or relative to the current working directory for the R session. Each file should be a csv with the same format. Only the first file is used to determine column names, types, etc. |
attributes |
The specification of the columns being read from the files.
This should either be the name of a relation or a call to In the former case, the name is specified in the same format as it is for
Otherwise, the attributes are specified as the arguments of the call to
However, in the case that the value of In the case that the file has more columns than are specified here, extra columns at the end are simply skipped. |
header |
Whether the files contain the name of the columns on the first line. If so, only the header in the first file is used. |
skip |
The number of lines to skip at the beginning of each document. |
nrows |
The maximum number of lines to read. Negative and other invalid values are ignored. |
sep |
The field delimiter, given as a length-one character vector. This single element should either be the word "TAB" or a single ASCII character, escape characters included. For example, "\t", " ", and "\" work. The only exception to this is "\n" for obvious reasons. |
simple |
Whether quotes are allowed. If not, then the fields are split whenever the delimiter is seen, regardless of whether it is inside a quoted string. Using a simple algorithm is significantly faster than not and is highly recommended whenever possible. |
quote |
The character used to quote strings, given as a length-one character vector whose single element should be a single ASCII character. This is the character that is used to quote strings. Having different characters to quote a string, such as "(" and ")" is not supported. |
escape |
The character used to escape the quote character, given as a length-one character vector whose single element should be a single ASCII character. |
trim.cr |
Whether to check for and remove the carriage return (CR) characters. On Window machines, lines typically end with a carriage return before the line feed (LF) character, a.k.a. the newline character. Setting this as true ensures that CRs are not included in the last field. Only LF and CR+LF behaviour is currently supported. See here for more information. |
nullable |
An object used to specify the strings for each column that
are to be interpreted as ‘NULL’ values, somewhat analogous to the
The null string for each attribute can either be a length-one character, whose only element is taken to be the null string, OR a length-one logical.
If given as a list, the elements are interpreted in the following order: 1) If named, the name is taken to be the attribute. The value is
interpreted as described above.
2) If a list, the element labelled ‘attr’ should be a length-one
character giving the attribute name. If an element labelled ‘null’
exists, it is interpreted as the null string; if not, then the null string
is taken to be Any other format results in an error. Currently only the (1) format of the list is supported. |
MoreArgs |
A list of additional arguments to pass to an inner call to
Any argument taken by both Furtheremore, arguments may be changed to fully mimic the behavior of the
Grokit CSV Reader, such as to accomodate |
chunk |
The chunk size, to be passed to |
This section deals with the specification of attribute names and types, which
is considerably more complicated that of read.table
. The
description of attributes
should be read before continuing.
When given as a call, the specification is quoted and broken apart before
being processed on a per-element basis. Unlike read.table
,
column names and types can be specified for some columns and left blank for
others, in which case automatic processing takes over as it does in
read.table
. In order to skip either a column name or type,
simply omit the corresponding label in the call, using a completely empty
argument when skipping both for a column.
For example, if you want to omit the name of the second column, the type of
the third column, and both for the fourth column, c(a = b, c, d=, )
would be appropriate. In this example, the first column has name “a”
and type b
. The second column is given a generated name, such as
“V1”. Third column has its type deduced based on the file. Both of
these occur for the fourth column.
In the case that you want a single column without specifying either the name
or the type, simply use c()
. Normally, converting this to a list
structure based on its AST results in no arguments. However, it is understood
that reading in a CSV with zero columns is nonsensical and so this special
functionality is used, as there is no other way to specify such a call.
In truth, the function being called does not have to be c
. A warning
is thrown but the exact function being called is otherwise ignored. This
allows for accidentally using list
or similar mistakes.
The default names for columns are the same as they are in read.table
,
If the file header is read, then the corresponding name is used. Otherwise,
the name is “V” followed by the column index, starting at zero.
Types are determined by an inner call to read.table
. However,
strings are never assumed to be factors. See the MoreArgs
argument
for more details.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.