Hybrid R/SciDB factors
Create R factor variables that include SciDB dimension array index values.
An R vector or factor vector.
A SciDB dimension array. This can be an array created with the SciDB
A factor vector with extra class
scidb_factor, and two additional
scidb_levels contains an R vector of looked-up SciDB index
values from the
levels array, and
scidb_index contains a
reference to the
scidb_factor values are treated specially when uploaded
from R to SciDB. Normal R factor values upload their contents. But
values upload their index values corresponding to their SciDB dimension array (as
type int64). Those values can then be directly joined with the SciDB dimension
array, or any SciDB array using the same dimension index. See the examples below.
B. W. Lewis <email@example.com>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
## Not run: # Consider a SciDB dimension array of Iris flower species, perhaps with # some additional data (made up in this example). set.seed(1) species <- data.frame( species=c("albicans","flavescens","germanica","setosa", "variegata", "versicolor", "virginica"), additional_data = runif(7)) species <- as.scidb(species, dimlabel="index") str(species) # Fisher's iris data example contain a subset of these species. str(iris$Species) # Let's index the iris data with the SciDB array. iris$Species <- factor_scidb(iris$Species, species) # The Species variable in the data frame now contains *two* indices, the usual # R enumeration of levels, and a new set of levels corresponding to the SciDB # lookup array. It also contains a reference to the lookup array. # Let's upload the newly indexed iris data to SciDB. Observe that the 'Species' # values are uploaded as SciDB int64 index values! Those indices are join-able # with the dimension array used to create the factor. x <- as.scidb(iris) str(x) # The advantage is that the x$Species values can be joined or redimensioned # conformably with the SciDB indexing array. # The next example computes the average iris data values grouped by Species # using SciDB's redimension function: xr <- redimension(x, dim="Species", FUN=mean) # That output is join-able with the SciDB dimension array species: merge(xr,species, by.x="Species", by.y="index") # ...the output should look something like this... # Sepal_Length Sepal_Width Petal_Length Petal_Width species additional_data #4 5.006 3.428 1.462 0.246 setosa 0.9082078 #6 5.936 2.770 4.260 1.326 versicolor 0.8983897 #7 6.588 2.974 5.552 2.026 virginica 0.9446753 ## End(Not run)
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.