read_symbolic_csv: Read a Symbolic Data CSV File
In dataSDA: Datasets and Basic Statistics for Symbolic Data Analysis

read_symbolic_csv

R Documentation

Read a Symbolic Data CSV File

Description

Reads an external CSV file containing symbolic data, automatically detects whether the data is interval-valued (min/max pairs or comma-separated), histogram-valued, modal-valued, or another symbolic type, and returns an appropriate R object.

Usage

read_symbolic_csv(
  file,
  sep = ",",
  header = TRUE,
  row.names = NULL,
  stringsAsFactors = FALSE,
  na.strings = c("", "NA"),
  symbolic_type = NULL,
  ...
)

Arguments

`file`	Path to the CSV file to read.
`sep`	Field separator character. Default `","`.
`header`	Logical; does the first row contain column names? Default `TRUE`.
`row.names`	Column number or character string giving row names. Passed to `read.table`. Default `NULL` (automatic).
`stringsAsFactors`	Logical; should character columns be converted to factors? Default `FALSE`.
`na.strings`	Character vector of strings to interpret as `NA`. Default `c("", "NA")`.
`symbolic_type`	Optional character string to override automatic type detection. One of `"interval"`, `"histogram"`, `"modal"`, or `"other"`. When `NULL` (the default) the type is detected automatically.
`...`	Additional arguments passed to `read.table`.

Details

The detection heuristic works as follows:

Interval (MM): If the file contains paired _min/_max columns the data is returned as-is (MM format).
Interval (iGAP): If one or more character columns contain comma-separated numeric pairs (e.g., "1.2,3.4") they are expanded into _min/_max column pairs and the result is returned in MM format.
Histogram / Modal: If columns follow a VarName(bin) naming pattern (e.g., Crime(violent)) and the proportions within each variable group sum to approximately 1, the data is classified as histogram or modal. It is returned as a plain data.frame.
Other: If none of the above patterns match, the data is returned as a plain data.frame.

Value

A data.frame. Interval data is returned in MM format (paired _min/_max columns). All other symbolic types are returned as plain data frames.

Examples

# Write then read back an interval dataset
data(mushroom.int.mm)
tmp <- tempfile(fileext = ".csv")
write_symbolic_csv(mushroom.int.mm, tmp)
df <- read_symbolic_csv(tmp)
head(df)

# Write then read back a histogram dataset
data(airline_flights.hist)
tmp2 <- tempfile(fileext = ".csv")
write_symbolic_csv(airline_flights.hist, tmp2)
df2 <- read_symbolic_csv(tmp2)
head(df2)

dataSDA documentation built on June 12, 2026, 9:06 a.m.