read_symbolic_csv: Read a Symbolic Data CSV File

View source: R/utilities.R

read_symbolic_csvR Documentation

Read a Symbolic Data CSV File

Description

Reads an external CSV file containing symbolic data, automatically detects whether the data is interval-valued (min/max pairs or comma-separated), histogram-valued, modal-valued, or another symbolic type, and returns an appropriate R object.

Usage

read_symbolic_csv(
  file,
  sep = ",",
  header = TRUE,
  row.names = NULL,
  stringsAsFactors = FALSE,
  na.strings = c("", "NA"),
  symbolic_type = NULL,
  ...
)

Arguments

file

Path to the CSV file to read.

sep

Field separator character. Default ",".

header

Logical; does the first row contain column names? Default TRUE.

row.names

Column number or character string giving row names. Passed to read.table. Default NULL (automatic).

stringsAsFactors

Logical; should character columns be converted to factors? Default FALSE.

na.strings

Character vector of strings to interpret as NA. Default c("", "NA").

symbolic_type

Optional character string to override automatic type detection. One of "interval", "histogram", "modal", or "other". When NULL (the default) the type is detected automatically.

...

Additional arguments passed to read.table.

Details

The detection heuristic works as follows:

  1. Interval (MM): If the file contains paired _min/_max columns the data is returned as-is (MM format).

  2. Interval (iGAP): If one or more character columns contain comma-separated numeric pairs (e.g., "1.2,3.4") they are expanded into _min/_max column pairs and the result is returned in MM format.

  3. Histogram / Modal: If columns follow a VarName(bin) naming pattern (e.g., Crime(violent)) and the proportions within each variable group sum to approximately 1, the data is classified as histogram or modal. It is returned as a plain data.frame.

  4. Other: If none of the above patterns match, the data is returned as a plain data.frame.

Value

A data.frame. Interval data is returned in MM format (paired _min/_max columns). All other symbolic types are returned as plain data frames.

See Also

write_symbolic_csv, int_detect_format, int_convert_format

Examples

# Write then read back an interval dataset
data(mushroom.int.mm)
tmp <- tempfile(fileext = ".csv")
write_symbolic_csv(mushroom.int.mm, tmp)
df <- read_symbolic_csv(tmp)
head(df)

# Write then read back a histogram dataset
data(airline_flights.hist)
tmp2 <- tempfile(fileext = ".csv")
write_symbolic_csv(airline_flights.hist, tmp2)
df2 <- read_symbolic_csv(tmp2)
head(df2)

dataSDA documentation built on June 12, 2026, 9:06 a.m.