fparse: Parse Format Definitions from 'SAS'-like Text

View source: R/format_parse.R

fparseR Documentation

Parse Format Definitions from 'SAS'-like Text

Description

Reads format definitions written in a human-friendly 'SAS'-like syntax and returns a list of ks_format and/or ks_invalue objects. All parsed formats are automatically stored in the global format library.

Usage

fparse(text = NULL, file = NULL, verbose = FALSE)

Arguments

text

Character string or character vector containing format definitions. If a character vector, lines are concatenated with newlines.

file

Path to a text file containing format definitions. Exactly one of text or file must be provided.

verbose

Logical. If TRUE, the parsed formats are printed to the console. Default is FALSE to suppress output (the result is returned invisibly).

Details

The syntax supports two block types:

VALUE blocks define formats (value -> label):

VALUE name (type)
  "value1" = "Label 1"
  "value2" = "Label 2"
  [low, high) = "Range Label (half-open)"
  (low, high] = "Range Label (open-low, closed-high)"
  .missing = "Missing Label"
  .other = "Other Label"
;

INVALUE blocks define reverse formats (label -> numeric value):

INVALUE name
  "Label 1" = 1
  "Label 2" = 2
;

Syntax rules:

  • Blocks start with VALUE or INVALUE keyword and end with ;

  • The type in parentheses is optional; defaults to "auto" for VALUE, "numeric" for INVALUE

  • Values can be quoted or unquoted

  • Ranges use interval notation with explicit bounds

  • Legacy range syntax low - high is also supported

  • Special range keywords: LOW (-Inf) and HIGH (Inf)

  • .missing and .other are special directives

  • Lines starting with /*, *, //, or # are comments

Block options:

Comma-separated options can be placed inside the parentheses after the type:

  • nocase — enables case-insensitive key matching (equivalent to ignore_case = TRUE in fnew).

  • multilabel — allows overlapping ranges where a single value matches multiple labels (used with fput_all).

Options can be combined: VALUE name (character, nocase, multilabel).

Value

A named list of ks_format and/or ks_invalue objects. Names correspond to the format names defined in the text. All formats are automatically registered in the global format library.

Examples

# Parse multiple format definitions from text
fparse(text = '
VALUE sex (character)
  "M" = "Male"
  "F" = "Female"
  .missing = "Unknown"
;

VALUE age (numeric)
  [0, 18)    = "Child"
  [18, 65)   = "Adult"
  [65, HIGH]  = "Senior"
  .missing   = "Age Unknown"
;

// Invalue block
INVALUE race_inv
  "White" = 1
  "Black" = 2
  "Asian" = 3
;
')

fput(c("M", "F", NA), "sex")
fputn(c(5, 25, 70, NA), "age")
finputn(c("White", "Black"), "race_inv")
flist()
fprint()
fclear()

# Parse date/time/datetime format definitions
fparse(text = '
VALUE enrldt (date)
  pattern = "DATE9."
  .missing = "Not Enrolled"
;

VALUE visit_time (time)
  pattern = "TIME8."
;

VALUE stamp (datetime)
  pattern = "DATETIME20."
;
')

fput(as.Date("2025-03-01"), "enrldt")
fput(36000, "visit_time")
fput(as.POSIXct("2025-03-01 10:00:00", tz = "UTC"), "stamp")
fclear()

# Case-insensitive format (nocase option)
fparse(text = '
VALUE yesno (character, nocase)
  "Y" = "Yes"
  "N" = "No"
  .other = "Unknown"
;
')
fput(c("y", "N", "YES"), "yesno")
# [1] "Yes" "No" "Unknown"
fclear()

# Parse multilabel format
fparse(text = '
VALUE risk (numeric, multilabel)
  [0, 3]  = "Low Risk"
  [0, 7]  = "Monitored"
  (3, 7]  = "Medium Risk"
  (7, 10] = "High Risk"
;
')
fput_all(c(2, 5, 9), "risk")
fclear()

ksformat documentation built on May 21, 2026, 9:07 a.m.