Using sparta

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

A probability mass function can be represented by a multi-dimensional array. However, for high-dimensional distributions where each variable may have a large state space, lack of computer memory can become a problem. For example, an $80$-dimensional random vector in which each variable has $10$ levels will lead to a state space with $10^{80}$ cells. Such a distribution can not be stored in a computer; in fact, $10^{80}$ is one of the estimates of the number of atoms in the universe. However, if the array consists of only a few non-zero values, we need only store these values along with information about their location. That is, a sparse representation of a table. Sparta was created for efficient multiplication and marginalization of sparse tables.

How to use sparta

library(sparta)

Consider two arrays f and g:

dn <- function(x) setNames(lapply(x, paste0, 1:2), toupper(x))
d  <- c(2, 2, 2)
f  <- array(c(5, 4, 0, 7, 0, 9, 0, 0), d, dn(c("x", "y", "z")))
g  <- array(c(7, 6, 0, 6, 0, 0, 9, 0), d, dn(c("y", "z", "w")))

with flat layouts

ftable(f, row.vars = "X")
ftable(g, row.vars = "W")

We can convert these to their equivalent sparta versions as

sf <- as_sparta(f); sg <- as_sparta(g)

Printing the object by the default printing method yields

print.default(sf)

The columns are the cells in the sparse matrix and the vals attribute are the corresponding values which can be extracted with the vals function. Furthermore, the domain resides in the dim_names attribute, which can also be extracted using the dim_names function. From the output, we see that (x2, y2, z1) has a value of $2$. Using the sparta print method prettifies things:

print(sf)

where row $i$ corresponds to column $i$ in the sparse matrix. The product of sf and sg

mfg <- mult(sf, sg); mfg

Converting sf into a conditional probability table (CPT) with conditioning variable Z:

sf_cpt <- as_cpt(sf, y = "Z"); sf_cpt

Slicing sf on X1 = x1 and dropping the X dimension

slice(sf, s = c(X = "x1"), drop = TRUE)

reduces sf to a single non-zero element, whereas the equivalent dense case would result in a (Y,Z) table with one non-zero element and three zero-elements.

Marginalizing (or summing) out Y in sg yields

marg(sg, y = c("Y"))

Finally, we mention that a sparse table can be created using the constructor sparta_struct, which can be necessary to use if the corresponding dense table is too large to have in memory.

Functionalities in sparta

| Function name | Description | |:-------------------------|:-------------------------------------------------------------------| | as_<sparta> | Convert \code{array}-like object to a sparta | | as_<array/df/cpt> | Convert sparta object to an array/data.frame/CPT | | sparta_struct | Constructor for sparta objects | | mult, div, marg, slice | Multiply/divide/marginalize/slice | | normalize | Normalize (the values of the result sum to one) | | get_val | Extract the value for a specific named cell | | get_cell_name | Extract the named cell | | get_values | Extract the values | | dim_names | Extract the domain | | names | Extract the variable names | | max/min | The maximum/minimum value | | which_<max/min>_cell | The column index referring to the max/min value | | which_<max/min>_idx | The configuration corresponding to the max/min value | | sum | Sum the values | | equiv | Test if two tables are identical up to permutations of the columns | | | |



Try the sparta package in your browser

Any scripts or data that you put into this service are public.

sparta documentation built on April 12, 2022, 9:06 a.m.