# lvec: out of memory vectors in R

Core functionality for working with vectors (numeric, integer, logical and character) that are too large to keep in memory. The vectors are kept (partially) on disk using memory mapping. This package contains the basic functionality for working with these memory mapped vectors (e.g. creating, indexing, ordering and sorting) and provides C++ headers which can be used by other packages to extend the functionality provided in this package.

```library(lvec)
```

## Functionality

### Creating and converting to and from lvecs

The core function to create an lvec is `lvec`. It accepts two arguments: an length (`size`) and a type (`type`) which can have the following values: `numeric`, `integer`, `logical` and `character`. This will create an lvec with the given length of the given type. In case of `character` you will also have to specify a maximum string length (`strlen`) as strings are stored with a fixed width to speed up access. When assigning to a character lvec strings longer than the maximum string length are truncated.

Create an integer vector of length 100

```x <- lvec(100, type = "integer")
```

Get the first 10 values. Values are initialised to 0 by default.

```lget(x, 1:10)
```

Set the first 10 values to 11:20

```lset(x, 1:10, 11:20)
```

Set maximum length of the string to 1, strings longer than that get truncated. However, minimum value of strlen is 2 (this is needed to be able to store missing values).

```x <- lvec(10, type = "character", strlen = 1)
lset(x, 1:3, c("a", "foo", NA))
lget(x, 1:3)
```

An other function for creating lvecs, is `as_lvec` which will try to convert a R-vector (e.g. a numeric, integer, logical or character vector) to an lvec of the corresponding type. In case of a character vector the maximum string length is set equal to the maximum string length found in the input.

Convert a character vector to lvec:

```x <- as_lvec(letters)
lget(x, 1:26)
```

Finally, most functions operating on lvecs will return an lvec. In order to work directly in R with a result, this result needs to be converted to a regular R-vector. For this the function `as_rvec` can be used, which will take an lvec and return a vector of type `numeric`, `integer`, `logical` or `character`.

Convert `x`, defined above, back to an R-vector:

```as_rvec(x)
```

### Reading from and writing to lvecs

Subsets of an lvec can be selected using `lget`. Three types of selection are possible: using numeric indices and logical indices (as one would index an reguler R-vector), or using a range index (using the `range` argument). A range index is a numeric vector of length two giving the lower and upper bound of the range one selects (inclusive).

Getting the first 3 elements of an lvec

```x <- as_lvec(1:5)
lget(x, 1:3)
lget(x, c(TRUE, TRUE, TRUE, FALSE, FALSE))
lget(x, range = c(1,3))
```

`lget` always returns an lvec. When a regular R vector is needed `as_rvec` can be used to convert the lvec.

Values can be written (assigned) to an lvec using `lset`. The first parameter is the lvec, the second the (numeric or logical) indices and the last the new values. This function should behave as regular assignent in R (e.g. `x[1:3] <- 4` should behave as `lset(x, 1:3, 4)`).

Changing the first three elements:

```lset(x, 1:3, 33:31)
```

When indexing with a numeric index, values are recycled:

```lset(x, 1:3, 100)
```

When indexing with a logical index, the index is also recycled (to the length of the lvec) For example, setting the odd elements of an lvec to 1:

```lset(x, c(TRUE, FALSE), 1)
```

For both `lset` and `lget` the index and values can also be other lvecs. When they are not lvecs they are first converted to lvecs using `as_lvec`.

### Working with lvecs

The length of an lvec can be obtained and changed using `length`:

```x <- as_lvec(1:5)
length(x)
length(x) <- 3
print(x)
length(x) <- 5
print(x)
```

The maximum string length of an character lvec can be obtained and changed using `strlen`. When changing the maximum string length to a smaller value, string longer than the maximum string length are truncated.

```x <- as_lvec(c("foo", "bar", "foobar"))
strlen(x)
strlen(x) <- 3
print(x)
```

`is_lvec` can be used to check if an object is an lvec. With `lvec_type` the exact type of the lvec can be obtained: `numeric`, `integer`, `logical` or `character`.

Lvec objects can be seen as references to a piece of memory. Therefore, when an lvec is copied only the reference is copied and not the actual data; the copy refers to the same memory as the original. The advantage of this is that copying is fast (no data needs to be copied); the disadvantage is that modifying the copy will also modify the original. This behaviour is different from that of other R objects. This behaviour is demonstrated below:

```introduce_missing_values <- function(x) {
lset(x, c(TRUE, FALSE), NA)
x
}

a <- as_lvec(1:5)
print(a)
b <- introduce_missing_values(a)
print(b)
print(a)
```

With the `clone` function a true copy can be made of an lvec, it will create a new lvec and copy the data from the original lvec. Using `clone` the function in the example above can be given behaviour that is more in line with the behaviour of regular R functions:

```introduce_missing_values <- function(x) {
res <- clone(x)
lset(res, c(TRUE, FALSE), NA)
res
}

a <- as_lvec(1:5)
print(a)
b <- introduce_missing_values(a)
print(b)
print(a)
```

When speed is an issue, you could introduce a `clone` argument with a default value of `TRUE`. This will allow a user to skip the clone when it is not needed:

```introduce_missing_values <- function(x, clone = TRUE) {
if (clone) x - clone(x)
lset(x, c(TRUE, FALSE), NA)
x
}
a <- as_lvec(1:100)
a <- introduce_missing_values(a, clone = FALSE)
```

### Ordering and sorting

When working with large vectors (that don't fit into memory) ordering and sorting are important operations. Therefore, `order` and `sort` are also implemented for lvecs. These functions should behave as the reguler R functions. However, sorting of character lvecs is implemented in C++, and will therefore sorts using the local C-locale, which might be different from the locale used in R.

```x <- as_lvec(rnorm(10))
order(x)
lget(x, order(x))
sort(x)
```

### Working with attributes

In order to have support for factor and POSIXct (dates and times) objects, lvec tries to keep track of the original R attributes. A factor is internally stores as an integer vector with a `levels` and `class` attribute. A POSIXct object is a numeric vector with a `class` attribute. The attributes of the original R objects can be obtained and modifier using the `rattr` functions. When converting an lvec to an R vector, the attributes stored in rattr are applied to the R object.

```x <- as_lvec(factor(sample(letters[1:3], 10, replace=TRUE)))
print(x)
rattr(x)
rattr(x, "class")
as_rvec(x)

x <- as_lvec(as.POSIXct("2016-01-01", "2015-03-22"))
print(x)
rattr(x)
rattr(x, "class")
as_rvec(x)

x <- as_lvec(1:3)
rattr(x, "class") <- "factor"
rattr(x, "levels") <- c("a", "b", "c")
print(x)
```

### Reading and writing lvecs to disk

lvec objects can be written to disk and read back using `lsave` and `lread`. The lvec is divided into a number of chunks. Each chunk is written to an separate RDS file (see `saveRDS`). Finally, the names of the files, size of the lvec and some additional information to correctly read back the lvec is written to an RDS file. The filenames of the chunks have the following format: `<filename>.00001.RDS`, `<filename>.00002.RDS` etc.; and the file containing the background information has the name: `<filename>.RDS`. When the filename given to `lsave` has the extension `RDS` this extension is first removed. The lvec can be read back into memory using `lread` passing it the file containing the background information (`<filename>.RDS`).

```x <- as_lvec(1:10)
lsave(x, "tmp.RDS")
list.files(pattern = "*.RDS")
print(y)
```

## Linking to lvec from another R package

The lvec package contains only the basic functionality for working with lvec objects. The goal is that other packages can extend the functionality of the lvec package. Although most extensions can probably be built directly in R code using the functionality provided in lvec, sometimes it will be necessary to work directly with the C++ objects behing the lvec object. To support this most headers and some C++ functions are exported. Other packages can use these by having

```LinkingTo:
lvec
```

in their `DESCRIPTION` file. This makes the header files available for use by the package. They can the include all headers using:

```#include <lvec_interface.h>
```

in their C++ code.

djvanderlaan/lvec documentation built on Aug. 16, 2018, 11:02 a.m.