library(largeList) knitr::opts_chunk$set( comment = "#", error = FALSE, tidy = FALSE, cache = FALSE, collapse = TRUE)
The package largeList is designed to handle large list objects in R. In many business and engineering scenarios, huge among of unstructured data needs to be stored into list objects, which causes both RAM consumption and running time problems. This package enables serializing, compressing and saving elements in list separately, therefore provides the possibility to randomly access elements stored in files.
R objects will be serialized with an uncompressed/ compressed (zlib, default level) non-ascii little-endian format, which is similar to saveRDS. Two ordered tables are created at the end of data for quick lookups, one for indices and one for element names. Notice that, all the names will be truncated to 16 characters.
Given indices or names of elements, positions will be directly extracted or extracted via binary search within the name-position table. Then required elements are located and unserialized. Therefore it will not restore the whole list into memory.
In the current version, only basic data types are supported, including NULL, integer, numeric, character, complex, raw, logic, factor, list, matrix, array and data.frame. Types like function, data.table are not supported.
The supported maximum size of each R object stored in list is $2^{31} -1$ bytes.
There're basically two ways to use the package: via original functions or use operator overloadings.
Basic functions include:
If parameter append
is TRUE
, file will be created if not exists, or truncated if already exists. If append = FALSE
, list object will be appended to the file using the same compression setting.
# save list_1 to a new file called example.llo using compression. list_1 <- list("A" = c(1,2), "B" = "abc", list(1, 2)) saveList(object = list_1, file = "example.llo", append = FALSE, compress = TRUE) # append list_2 to the existing file example.llo, compress option will be extracted from the file. list_2 <- list("C" = data.frame(col_1 = 1:2, col_2 = 3:4), "D" = matrix(0, nrow = 2, ncol = 2)) saveList(object = list_2, file = "example.llo", append = TRUE)
Different kinds of indices can be used in readList to access data.
# all elements list_read <- readList(file = "example.llo") # by numeric indices list_read <- readList(file = "example.llo", index = c(1, 3)) # by names list_read <- readList(file = "example.llo", index = c("A", "B")) # by logical indices list_read <- readList(file = "example.llo", index = c(T, F, T, F, T))
Removing can also be done using different indices. This function may relocate all the data in the stored file, thus can be very slow! Please consider to call this function batchwise instead of index one by one.
# copy the file file.copy(from = "example.llo", to = "example_remove.llo") # by numeric indices removeFromList(file = "example_remove.llo", index = c(2)) # by names removeFromList(file = "example_remove.llo", index = c("A", "D")) # by logical indices removeFromList(file = "example_remove.llo", index = c(T, F)) # remove file file.remove("example_remove.llo")
modifyInList modifies elements with given indices by replacement values provided in parameter object. If length of replacement values is shorter than length of indices, values will be used circularly. This function may relocate all the data in the stored file, thus can be very slow! Please consider to call this function batchwise instead of one by one.
# copy the file file.copy(from = "example.llo", to = "example_modify.llo") # by numeric indices modifyInList(file = "example_modify.llo", index = c(1, 2), object = list("AA", "BB")) # by names modifyInList(file = "example_modify.llo", index = c("C","D"), object = list("C","D")) # by logical indices modifyInList(file = "example_modify.llo", index = c(T, F), object = list(1, 2)) # remove file file.remove("example_modify.llo")
modifyNameInList modifies names of elements with given indices by replacement values provided in parameter name
. If the length of replacement values is shorter than the length of indices, values will be used circularly.
# copy the file file.copy(from = "example.llo", to = "example_modify_name.llo") # by numeric indices modifyNameInList(file = "example_modify_name.llo", index = c(1, 2), name = c("new_name_A", "new_name_B")) # by logical indices modifyNameInList(file = "example_modify_name.llo", index = c(T, F), name = c("new_name_C", "new_name_D")) # remove file file.remove("example_modify_name.llo")
getListName("example.llo")
getListLength("example.llo") # remove file file.remove("example.llo")
Through operator overloadings, list objects stored in file can be manipulated pretty similar to basic R list objects.
getList creates a R object of class "largeList" and bind it with a file.
# by setting truncate == TRUE, file will be truncated if exists. largelist_object <- getList("example.llo", verbose = TRUE, truncate = TRUE) # by setting truncate == FALSE, it will bind to existing file. largelist_object <- getList("example.llo", verbose = TRUE, truncate = FALSE)
Save and append syntaxes are a little bit different from basic list type.
# save list largelist_object[[]] <- list("A" = 1, "B" = 2) # append list largelist_object[] <- list("C" = 3, "D" = 4)
The same as list type, []
for getting sublist, [[]]
for getting one element.
# For print just use largelist_object, for assignment, use largelist_object[] largelist_object object_copy <- largelist_object[] # by numeric indices largelist_object[c(1,2)] largelist_object[[1]] # by names largelist_object[c("A", "E")] largelist_object[["A"]] # by logical indices largelist_object[c(T, F)]
The same as list type, assign NULL
to values.
# by numeric indices largelist_object[1] <- NULL # by names largelist_object["B"] <- NULL # by logical indices largelist_object[c(T,F)] <- NULL
The same as list type. Depends on indices, elements will be changed or appended.
largelist_object[[]] <- list("A" = 1, "B" = 2, "C" = 3, "D" = 4) # by numeric indices largelist_object[c(1, 5)] <- list(1, "E" = 5) # by names largelist_object[c("C","F")] <- c(5, 7) # by logical indices largelist_object[c(T, F)] <- c(8) print(largelist_object)
largelist_object[[]] <- list("A" = 1, "B" = 2) # get names names(largelist_object) # modify names names(largelist_object)[c(1, 2)] <- c("AA", "BB") names(largelist_object)[c(F, T)] <- c("DD") print(largelist_object)
Other operators like print
, length
, head
, tail
are also avaliable.
largelist_object[[]] <- list("A" = 1, "B" = 2) # maximal number to print can be changed by setting option largeList.max.print. print(largelist_object) length(largelist_object) head(largelist_object) tail(largelist_object) # remove object and file rm(largelist_object) file.remove("example.llo")
Processing progress will be output to console if operations take too long, it can be switched off by setting option largeList.report.progress
to FALSE
. (options(list(largeList.report.progress = FALSE))
)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.