database_schema: Database schemas
In autodb: Automatic Database Normalisation for Data Frames

database_schema

R Documentation

Database schemas

Description

Enhances a relation_schema object with foreign key reference information.

Usage

database_schema(relation_schemas, references)

Arguments

relation_schemas

a relation_schema object, as returned by synthesise or relation_schema.

references

a list of references, each represented by a list containing four character elements. In order, the elements are a scalar giving the name of the child (referrer) schema, a vector giving the child attribute names, a scalar giving the name of the parent (referee) schema, and a vector giving the parent attribute names. The vectors must be of the same length and contain names for attributes present in their respective schemas, and the parent attributes must form a key.

Details

Unlike functional_dependency and relation_schema, database_schema is not designed to be vector-like: it only holds a single database schema. This adheres to the usual package use case, where a single data frame is being analysed at a time. However, it inherits from relation_schema, so is vectorised with respect to its relation schemas.

As with relation_schema, duplicate relation schemas, after ordering by attribute, are allowed, and can be removed with unique.

References, i.e. foreign key references, are allowed to have different attribute names in the child and parent relations; this can't occur in the output for autoref and normalise.

Subsetting removes any references that involve removed relation schemas. Removing duplicates with unique changes references involving duplicates to involve the kept equivalent schemas instead. Renaming relation schemas with names<- also changes their names in the references.

Value

A database_schema object, containing relation_schemas with references stored in an attribute of the same name. References are stored with their attributes in the order they appear in their respective relation schemas.

Examples

rs <- relation_schema(
  list(
    a = list(c("a", "b"), list("a")),
    b = list(c("b", "c"), list("b", "c"))
  ),
  attrs_order = c("a", "b", "c", "d")
)
ds <- database_schema(
  rs,
  list(list("a", "b", "b", "b"))
)
print(ds)
attrs(ds)
keys(ds)
attrs_order(ds)
names(ds)
references(ds)

# relations can't reference themselves
## Not run: 
  database_schema(
    relation_schema(
      list(a = list("a", list("a"))),
      c("a", "b")
    ),
    list(list("a", "a", "a", "a"))
  )
  database_schema(
    relation_schema(
      list(a = list(c("a", "b"), list("a"))),
      c("a", "b")
    ),
    list(list("a", "b", "a", "a"))
  )

## End(Not run)

# an example with references between differently-named attributes
print(database_schema(
  relation_schema(
    list(
      citation = list(c("citer", "citee"), list(c("citer", "citee"))),
      article = list("article", list("article"))
    ),
    c("citer", "citee", "article")
  ),
  list(
    list("citation", "citer", "article", "article"),
    list("citation", "citee", "article", "article")
  )
))

# vector operations
ds2 <- database_schema(
  relation_schema(
    list(
      e = list(c("a", "e"), list("e"))
    ),
    attrs_order = c("a", "e")
  ),
  list()
)
c(ds, ds2) # attrs_order attributes are merged
unique(c(ds, ds))

# subsetting
ds[1]
stopifnot(identical(ds[[1]], ds[1]))
ds[c(1, 2, 1, 2)] # replicates the foreign key references
c(ds[c(1, 2)], ds[c(1, 2)]) # doesn't reference between separate copies of ds
unique(ds[c(1, 2, 1, 2)]) # unique() also merges references

# another example of unique() merging references
ds_merge <- database_schema(
  relation_schema(
    list(
      a = list(c("a", "b"), list("a")),
      b = list(c("b", "c", "d"), list("b")),
      c_d = list(c("c", "d", "e"), list(c("c", "d"))),
      a.1 = list(c("a", "b"), list("a")),
      b.1 = list(c("b", "c", "d"), list("b"))
    ),
    c("a", "b", "c", "d", "e")
  ),
  list(
    list("a", "b", "b", "b"),
    list("b.1", c("c", "d"), "c_d", c("c", "d"))
  )
)
print(ds_merge)
unique(ds_merge)

# reassignment
# can't change keys included in references
## Not run: keys(ds)[[2]] <- list("c")
# can't remove attributes included in keys
## Not run: attrs(ds)[[2]] <- list("c", "d")
# can't remove attributes included in references
## Not run: attrs(ds)[[1]] <- c("a", "d")
ds3 <- ds
# can change subset of schema, but loses references between altered and
# non-altered subsets
ds3[2] <- database_schema(
  relation_schema(
    list(d = list(c("d", "c"), list("d"))),
    attrs_order(ds3)
  ),
  list()
)
print(ds3) # note the schema's name doesn't change
# names(ds3)[2] <- "d" # this would change the name
keys(ds3)[[2]] <- list(character()) # removing keys first...
attrs(ds3)[[2]] <- c("b", "c") # so we can change the attrs legally
keys(ds3)[[2]] <- list("b", "c") # add the new keys
# add the reference lost during subset replacement
references(ds3) <- c(references(ds3), list(list("a", "b", "b", "b")))
stopifnot(identical(ds3, ds))

# changing appearance priority for attributes
attrs_order(ds3) <- c("d", "c", "b", "a")
print(ds3)

# changing relation schema names changes them in references
names(ds3) <- paste0(names(ds3), "_long")
print(ds3)

# reconstructing from components
ds_recon <- database_schema(
  relation_schema(
    Map(list, attrs(ds), keys(ds)),
    attrs_order(ds)
  ),
  references(ds)
)
stopifnot(identical(ds_recon, ds))
ds_recon2 <- database_schema(
  subschemas(ds),
  references(ds)
)
stopifnot(identical(ds_recon2, ds))

# can be a data frame column
data.frame(id = 1:2, schema = ds)

autodb documentation built on Nov. 5, 2025, 6:49 p.m.