read_naaccr_plain | R Documentation |
Read and parse cancer incidence records according to a NAACCR format from
either fixed-width files (read_naaccr
and read_naaccr_plain
)
or XML documents (read_naaccr_xml
and read_naaccr_xml_plain
).
read_naaccr_plain(
input,
version = NULL,
format = NULL,
keep_fields = NULL,
skip = 0,
nrows = Inf,
buffersize = 10000,
encoding = getOption("encoding")
)
read_naaccr(
input,
version = NULL,
format = NULL,
keep_fields = NULL,
keep_unknown = FALSE,
skip = 0,
nrows = Inf,
buffersize = 10000,
encoding = getOption("encoding"),
...
)
read_naaccr_xml_plain(
input,
version = NULL,
format = NULL,
keep_fields = NULL,
as_text = FALSE,
encoding = getOption("encoding")
)
read_naaccr_xml(
input,
version = NULL,
format = NULL,
keep_fields = NULL,
keep_unknown = FALSE,
as_text = FALSE,
encoding = getOption("encoding"),
...
)
input |
Either a string with a file name (containing no |
version |
An integer specifying the NAACCR format version for parsing
the records. Use this or |
format |
A |
keep_fields |
Character vector of XML field names to keep in the
dataset. If |
skip |
An integer specifying the number of lines of the data file to skip before beginning to read data. |
nrows |
A number specifying the maximum number of records to read.
|
buffersize |
Maximum number of lines to read at one time. |
encoding |
String giving the input's encoding. See the 'Encoding'
section of |
keep_unknown |
Logical indicating whether values of "unknown" should be
a level in the factor or |
... |
Additional arguments passed onto |
as_text |
Logical indicating (if |
read_naaccr
and read_naaccr_xml
return data sets suited for
analysis in R.
read_naaccr_plain
and read_naaccr_xml_plain
return data sets
with the unchanged record values.
Anyone who wants to analyze the records in R should use read_naaccr
or read_naaccr_xml
.
In the returned naaccr_record
, columns are of appropriate
classes, coded values are replaced with factors, and unknowns are replaced
with NA
.
read_naaccr_plain
and read_naaccr_xml_plain
is a "format strict"
way to read incidence records.
All values returned are the literal character values from the records.
The only processing done is that leading and trailing whitespace is trimmed.
This is useful if the values will be passed to other software that expects
the plain NAACCR values.
For read_naaccr_plain
and read_naaccr
, if the version
and format
arguments are left NULL
, the default format is
version 18. This was the last format to be used for fixed-width files.
For read_naaccr
, a data.frame
of the records.
The columns included depend on the NAACCR record_format
version.
Columns are atomic vectors; there are too many to describe them all.
For read_naaccr_plain
, a data.frame
based on the
record_format
specified by either the version
or
format
argument.
The names of the columns will be those in the format's name
column.
All columns are character vectors.
Some of the parameter text was shamelessly copied from the
read.table
and read.fwf
help
pages.
North American Association of Central Cancer Registries (October 2018). Standards for Cancer Registries Volume II: Data Standards and Data Dictionary. Twenty first edition. https://apps.naaccr.org/data-dictionary/.
North American Association of Central Cancer Registries (April 2019). NAACCR XML Data Exchange Standard. Version 1.4. https://www.naaccr.org/xml-data-exchange-standard/.
naaccr_record
# This file has synthetic abstract records
incfile <- system.file(
"extdata", "synthetic-naaccr-18-abstract.txt",
package = "naaccr"
)
fields <- c("ageAtDiagnosis", "sex", "sequenceNumberCentral")
read_naaccr(incfile, version = 18, keep_fields = fields)
recs <- read_naaccr_plain(incfile, version = 18, keep_fields = fields)
recs
# Note sequenceNumberCentral has been split in two: a number and a flag
summary(recs[["sequenceNumberCentral"]])
summary(recs[["sequenceNumberCentralFlag"]])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.