read_naaccr: Read NAACCR records

Description Usage Arguments Details Value Note See Also Examples

View source: R/read_naaccr.R

Description

Read and parse cancer incidence records according to a NAACCR format. read_naaccr returns a data set suited for analysis in R, and read_naaccr_plain returns a data set with the unchanged record values.

Usage

1
2
3
4
5
6
7
read_naaccr_plain(input, version = NULL, format = NULL,
  keep_fields = NULL, skip = 0, nrows = Inf, buffersize = 10000,
  encoding = getOption("encoding"))

read_naaccr(input, version = NULL, format = NULL, keep_fields = NULL,
  keep_unknown = FALSE, skip = 0, nrows = Inf, buffersize = 10000,
  encoding = getOption("encoding"), ...)

Arguments

input

Either a string with a file name (containing no \n character), a connection object, or the text records themselves as a character vector.

version

An integer specifying the NAACCR format version for parsing the records. Use this or format, not both. If both version and format are NULL (default), the most recent NAACCR format will be used.

format

A record_format object for parsing the records.

keep_fields

Character vector of XML field names to keep in the dataset. If NULL (default), all columns are kept.

skip

An integer specifying the number of lines of the data file to skip before beginning to read data.

nrows

A number specifying the maximum number of records to read. Inf (the default) means "all records."

buffersize

Maximum number of lines to read at one time.

encoding

String giving the input's encoding. See the 'Encoding' section of file in the base package.

keep_unknown

Logical indicating whether values of "unknown" should be a level in the factor or NA.

...

Additional arguments passed onto as.naaccr_record.

Details

Anyone who wants to analyze the records in R should use read_naaccr. In the returned data.frame, columns are of appropriate classes, coded values are replaced with factors, and unknowns are replaced with NA.

read_naaccr_plain is a "format strict" way to read incidence records. All values returned are the literal character values from the records. The only processing done is that leading and trailing whitespace is trimmed. This is useful if the values will be passed to other software that expects the plain NAACCR values.

Value

For read_naaccr, a data.frame of the records. The columns included depend on the NAACCR record format version. Columns are atomic vectors; there are too many to describe them all.

For read_naaccr_plain, a data.frame with the columns specified by start_cols, end_cols, and col_names. All columns are character vectors.

Note

Some of the parameter text was shamelessly copied from the read.table and read.fwf help pages.

See Also

naaccr_record

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
  # This file has synthetic abstract records
  incfile <- system.file(
    "extdata", "synthetic-naaccr-18-abstract.txt",
    package = "naaccr"
  )
  fields <- c("ageAtDiagnosis", "sex", "sequenceNumberCentral")
  read_naaccr(incfile, version = 18, keep_fields = fields)
  recs <- read_naaccr_plain(incfile, version = 18, keep_fields = fields)
  recs
  # Note sequenceNumberCentral has been split in two: a number and a flag
  summary(recs[["sequenceNumberCentral"]])
  summary(recs[["sequenceNumberCentralFlag"]])

naaccr documentation built on Jan. 11, 2020, 9:17 a.m.