knitr::opts_chunk$set(echo = TRUE) options(rmarkdown.html_vignette.check_title = FALSE) library("frab") library("mvtnorm") set.seed(1)
knitr::include_graphics(system.file("help/figures/frab.png", package = "frab"))
To cite in publications, please use @hankin2023_frab.
TLDR: In R, adding two objects of class table
has a natural
interpretation. However, in base R, adding two tables can give
plausible but incorrect results. The frab
package provides a
consistent and efficient way to add table
objects, subject to
disordR
discipline [@hankin2022_disordR]. The underlying
mathematical structure is the Free Abelian group, hence "frab
".
table()
Suppose we have three R tables:
x <- table(c("a","a","b","c","d","d","a")) y <- table(c("a","a","b","d","d","d","e")) z <- table(c("a","a","b","d","d","e","f"))
Can we ascribe any meaning to x+y
without referring to the arguments
sent to table()
? Well yes, we should simply sum the counts of the
various letters. However:
x
y
x+y
The sum is defined in this case. However, close inspection shows that
the result is clearly incorrect. Although entries for a
and b
are
correct, the third and fourth entries are not as expected: in this
case R idiom simply adds the entries elementwise with no regard to
labels. We would expect x+y
to respect the fact that we have 5 d
entries, even though element d
is the fourth entry of x
and the
third of y
. Further:
x z
x+z
## Error in x + z: non-conformable arrays
Above we see that x
and z
do not have a well-defined sum, in the
sense that x+z
returns, quite reasonably, an error.
A named vector is a vector with a names
attribute. Each element
of a named vector is associated with a name or label. The names are
not necessarily unique. It allows you to assign a name to each
element, making it easier to refer to specific values within the
vector using their respective names. Named vectors are a convenient
and useful feature of the R programming language [@rcore2022].
However, consider the following two named vectors:
x <- c(a=1,b=2,c=3) y <- c(c=4,b=1,a=1)
Given that x+y
returns a named vector, there are at least two
plausible values that it might give, viz:
c(a=5,b=3,c=4)
or
c(a=2,b=3,c=7)
.
In the first case the elements of x
and y
are added pairwise, and
the names
attribute is taken from the first of the addends. In
the second, the names are considered to be primary and the value of
each name in the sum is the sum of the values of that name of the
addends. Note further that there is no good reason why the first
answer could not be c(c=5,b=3,a=4)
, obtained by using the names
attribute of y
instead of x
.
frab
packageThe frab
package furnishes efficient methods to give a consistent
and meaningful way of adding two R tables together, using standard R
syntax. It uses the names of a named vector as the indexing
mechanism. Package idiom is straightforward:
(x <- frab(c(a=1,b=2,d=7))) (y <- frab(c(c=4,b=1,a=-1))) x+y
Above, note how y
is defined with its entries in non-standard order,
but the resulting frab
object has its entries ordered
alphabetically. In x+y
, the entry for a
has vanished, as it
cancels in the summation. The numeric entries for each letter are
summed, accounting for the different names [viz a,b,d
and a,b,c
respectively]. The result is presented using the frab
print method.
Package idiom includes extraction and replacement methods, all of which should work as expected:
x <- frab(c(x=5,d=1,e=2,f=4,a=3,c=3,g=9)) x x>3 x<3 x[x>3] x[x<3] x[x<3] <- 100 x
Above we see that extraction and replacement methods follow disordR
discipline [@hankin2022_disordR]. Results are coerced to disord
objects if needed. Tables may be added to frab
objects:
a <- rfrab() b <- table(sample(letters[1:8],12,replace=T)) a b a+b
Above we see the +
operator is defined between a frab
and a
table
, coercing R tables to frab
objects to give consistent results.
If we pass a named vector with repeated names, the values are added:
frab(c(a=4,b=2,a=1))
The general rule is, given two named vectors x
and y
, that
frab(x)+frab(y)
is identical to frab(c(x,y))
.
The ideas above have a natural generalization to two-dimensional R tables.
(x <- rspar2(9)) (y <- rspar2(9)) x+y
Above, note that the resulting sum is automatically resized to
accommodate both addends, and also that entries with nonzero values in
both x
and y
are correctly summed.
The one- and two- dimensional R tables above have somewhat specialized
print methods and the general case with dimension $\geqslant 3$ uses
methods similar to those of the spray
package. We can generate a
sparsetable
object quite easily:
A <- matrix(0.95,3,3) diag(A) <- 1 x <- round(rmvnorm(300,mean=rep(10,3),sigma=A/7)) x[] <- letters[x] head(x) (sx <- sparsetable(x))
But we can add sx
to other sparsetable
objects:
(sz <- sparsetable(matrix(sample(letters[9:11],12,replace=TRUE),ncol=3),1001:1004))
Then the usual semantics for addition operate:
sx + sz
The word "table" means something unrelated in SQL
. A short
discussion of frab
functionality implemented in SQL
"table"
objects is given in inst/sql.Rmd
.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.