record_format: Define custom fields for NAACCR records

View source: R/record_format.R

record_formatR Documentation

Define custom fields for NAACCR records

Description

Create a record_format object, which is used to read NAACCR records.

Usage

record_format(
  name,
  item,
  start_col = NA_integer_,
  end_col = NA_integer_,
  type = "character",
  alignment = "left",
  padding = " ",
  parent = "Tumor",
  cleaner = list(NULL),
  unknown_finder = list(NULL),
  name_literal = NA_character_,
  width = NA_integer_
)

as.record_format(x, ...)

Arguments

name

Item name appropriate for a data.frame column name.

item

NAACCR item number.

start_col

First column of the field in a fixed-width record.

end_col

*Deprecated: Use the width parameter instead.* Last column of the field in a fixed-width record.

type

Name of the column class.

alignment

Alignment of the field in fixed-width files. Either "left" (default) or "right".

padding

Single-character strings to use for padding in fixed-width files.

parent

Name of the parent node to include this field under when writing to an XML file. Values can be "NaaccrData", "Patient", "Tumor", or NA (default). Fields with NA for parent won't be included in an XML file.

cleaner

(Optional) List of functions to handle special cases of cleaning field data (e.g., convert all values to uppercase). Values of NULL (the default) mean the default cleaning function for the type is used. The value can also be the name of a function to retrieve with getFunction. See Details.

unknown_finder

(Optional) List of functions to detect when codes mean the actual values are unknown or not applicable. Values of NULL (the default) mean the default unknown finding function for the type is used. The value can also be the name of a function to retrieve with getFunction. See Details.

name_literal

(Optional) Item name in plain language.

width

(Optional) Item width in characters.

x

Object to be coerced to a record_format, usually a data.frame or list.

...

Other arguments passed to record_format.

Details

To define registry-specific fields in addition to the standard fields, create a record_format object for the registry-specific fields and combine it with one of the formats provided with the package using rbind.

Value

An object of class "record_format" which has the following columns:

name

(character) XML field name.

item

(integer) Field item number.

start_col

(integer) First column of the field in a fixed-width text file. If NA, the field will not be read from or written to fixed-width files. They will included in XML files.

end_col

(integer) (*Deprecated: Use width instead.*) Last column of the field in a fixed-width text file. If NA, the field will not be read from or written to fixed-width files. This is the norm for fields only found in XML formats.

type

(factor) R class for the column vector.

alignment

(factor) Alignment of the field's values in a fixed-width text file.

padding

(character) String used for padding field values in a fixed-width text file.

parent

(factor) Parent XML node for the field. One of "NaaccrData", "Patient", or "Tumor".

cleaner

(list of function objects) Function to prepare the field's values for analysis. Values of NULL will use the standard cleaner functions for the type (see below).

unknown_finder

(list of function objects) Function to detect codes meaning the actual values are missing or unknown for the field.

name_literal

(character) Field name in plain language.

width

(integer) Character width of the field values. Mostly meant for reading and writing flat files.

Format Types

The levels type can take, along with the functions used to process them when reading a file:

address

(clean_address_number_and_street) Street number and street name parts of an address.

age

(clean_age) Age in years.

boolean01

(naaccr_boolean, with false_value = "0") True/false, where "0" means false and "1" means true.

boolean12

(naaccr_boolean, with false_value = "1") True/false, where "1" means false and "2" means true.

census_block

(clean_census_block) Census Block ID number.

census_tract

(clean_census_tract) Census Tract ID number.

character

(clean_text) Miscellaneous text.

city

(clean_address_city) City name.

count

(clean_count) Integer count.

county

(clean_county_fips) County FIPS code.

Date

(as.Date, with format = "%Y%m%d") NAACCR-formatted date (YYYYMMDD).

datetime

(as.POSIXct, with format = "%Y%m%d%H%M%S") NAACCR-formatted datetime (YYYYMMDDHHMMSS)

facility

(clean_facility_id) Facility ID number.

icd_9

(clean_icd_9_cm) ICD-9-CM code.

icd_code

(clean_icd_code) ICD-9 or ICD-10 code.

integer

(as.integer) Miscellaneous whole number.

numeric

(as.numeric) Miscellaneous decimal number.

override

(naaccr_override) Field describing why another field's value was over-ridden.

physician

(clean_physician_id) Physician ID number.

postal

(clean_postal) Postal code for an address (a.k.a. ZIP code in the United States).

ssn

(clean_ssn) Social Security Number.

telephone

(clean_telephone) 10-digit telephone number.

Examples

  my_fields <- record_format(
    name      = c("foo", "bar", "baz"),
    item      = c(2163, 1180, 1181),
    start_col = c(975, 1381, NA),
    width     = c(1, 55, 4),
    type      = c("numeric", "facility", "character"),
    parent    = c("Patient", "Tumor", "Tumor"),
    cleaner   = list(NULL, NULL, trimws)
  )
  my_format <- rbind(naaccr_format_16, my_fields)

naaccr documentation built on Sept. 13, 2024, 1:07 a.m.