Hybrid R/SciDB factors

Share:

Description

Create R factor variables that include SciDB dimension array index values.

Usage

1

Arguments

x

An R vector or factor vector.

levels

A SciDB dimension array. This can be an array created with the SciDB index_lookup function, or any one-dimensional SciDB array. If the array has more than one SciDB attribute, the first attribute will be used to look up values from x.

Value

A factor vector with extra class scidb_factor, and two additional attributes: scidb_levels contains an R vector of looked-up SciDB index values from the levels array, and scidb_index contains a reference to the levels array.

Note

scidb_factor values are treated specially when uploaded from R to SciDB. Normal R factor values upload their contents. But scidb_factor values upload their index values corresponding to their SciDB dimension array (as type int64). Those values can then be directly joined with the SciDB dimension array, or any SciDB array using the same dimension index. See the examples below.

Author(s)

B. W. Lewis <blewis@paradigm4.com>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
## Not run: 
# Consider a SciDB dimension array of Iris flower species, perhaps with
# some additional data (made up in this example).
set.seed(1)
species <- data.frame(
  species=c("albicans","flavescens","germanica","setosa", "variegata", "versicolor", "virginica"),
  additional_data = runif(7))

species <- as.scidb(species, dimlabel="index")
str(species)

# Fisher's iris data example contain a subset of these species.
str(iris$Species)

# Let's index the iris data with the SciDB array.
iris$Species <- factor_scidb(iris$Species, species)

# The Species variable in the data frame now contains *two* indices, the usual
# R enumeration of levels, and a new set of levels corresponding to the SciDB
# lookup array. It also contains a reference to the lookup array.

# Let's upload the newly indexed iris data to SciDB. Observe that the 'Species'
# values are uploaded as SciDB int64 index values! Those indices are join-able
# with the dimension array used to create the factor.
x <- as.scidb(iris)
str(x)

# The advantage is that the x$Species values can be joined or redimensioned
# conformably with the SciDB indexing array.

# The next example computes the average iris data values grouped by Species
# using SciDB's redimension function:
xr <- redimension(x, dim="Species", FUN=mean)

# That output is join-able with the SciDB dimension array species:
merge(xr,species, by.x="Species", by.y="index")[]

# ...the output should look something like this...
#  Sepal_Length Sepal_Width Petal_Length Petal_Width    species additional_data
#4        5.006       3.428        1.462       0.246     setosa       0.9082078
#6        5.936       2.770        4.260       1.326 versicolor       0.8983897
#7        6.588       2.974        5.552       2.026  virginica       0.9446753


## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.