library(datastructures) library(purrr)

`datastructures`

implements various data structures that are frequently used in computer science. Among these are for example *Fibonacci heaps*, *stacks*, *queues* or *hashmaps*. Advanced data structures are essential
in many computer science and statistics problems, for example graph algorithms or string analysis. The package uses `Boost`

and `STL`

data types and extends these to `R`

with `Rcpp`

modules.

Fibonacci and binomial heaps are priority queue data structures using the minimum heap property. They can be represented using collections of trees or linked lists consisting of *nodes* of elements. Every node is a pair of keys and values, where the key decides the priority of the *node* in the heap. Fibonacci heaps have various applications, one of the most famous being in efficiently finding shortest paths using Dijkstra's algorithm. Fibonacci heaps can add values in amortized $\mathcal{O}(1)$ time and remove the minimum value in $\mathcal{O}(log \, n)$ time making them a good choice in many real world scenarios. Binomial heaps have in general slightly worse asymptotic bounds. However, removing the minimum element (`pop`

) has an asymptotically tight bound of $\Theta(log \, n)$. The two classes have the exact same methods, so every method explained here is available for the other class, too.

You can create a heap like this:

fheap <- fibonacci_heap("numeric", "character") bheap <- binomial_heap("numeric", "character")

This gives us heaps with `numeric`

keys and `character`

values. You can pick from either `integer`

, `numeric`

or `character`

keys and values. So if we insert several key-value pairs, the pair with the *minimum* key would have the highest *priority*. Let's insert some values and have a look:

keys <- sample(seq(0, 1, by=.2)) values <- paste0("V", keys) fheap <- insert(fheap, keys, values) size(fheap)

There are also some other ways to insert into a heap. If you want to insert a `vector`

ial value, do this:

fheap <- insert(fheap, -1, letters) peek(fheap)

If you want to add multiple values, you need to use a `vector`

ial key and a `matrix`

as value:

fheap <- insert(fheap, c(-2, -3), matrix(letters, 2)) peek(fheap)

Furthermore, you can use lists as values to insert into a heap:

fheap <- insert(fheap, c(-4, -5, -6), list("a", letters[1:4], "hello")) peek(fheap)

Since Fibonacci and binomial heaps use the minimum heap property the first element in the heap should be `0 -> V0`

even though we inserted the pairs in a random fashion.

peek(fheap) size(fheap)

That worked nicely. `peek`

gives us the first element from the heap *without removing it*. If we want to have it removed we call the `pop`

function. Of course this changes the priority of the heap:

pop(fheap) size(fheap)

You can alternatively insert values like this:

fheap[-10] <- "V-1" peek(fheap)

This works for all the described `insert`

methods above. Even though we usually
don't need to retrieve all values from a heap, you can do so using the `values`

method:

values(fheap)

Sometimes, we want to decrease a key and rearrange its position in the
heap. If we have a *unique* key this can be easily done:

decrease_key(fheap, from=-10, to=-11) peek(fheap)

The function takes vectors, so you can also pass vectorial `to`

and `from`

arguments:

decrease_key(fheap, from=c(-5, -4), to=c(-15, -15)) peek(fheap)

However, things get more complicated if we have multiple identical keys and want to decrease a specific one. In this case, in order to decrease to correct node, we need to get its `handle`

first:

```
handle(fheap, -15)
```

The `handle`

method returns the `id`

of the node and the `value`

that is stored
for a given `key`

(in this case -15). Thus if we want to decrease a node with ambiguous key, we also need to specify the handle (e.g. the `id`

):

hand <- handle(fheap, -15) decrease_key(fheap, -15, -20, hand[[1]]$handle) peek(fheap)

The `handle`

method always returns a list where every element represents a node in the heap. Every list element is again a list consisting of the node's handle (its id). So you need to check the node's value to figure out the
corresponding handle. Furthermore, if you want to query by a node's `value`

you would call the `handle`

method like this:

fheap <- insert(fheap, 10, "V-10") fheap <- insert(fheap, 10, "V-10") h <- handle(fheap, value="V-10") h decrease_key(fheap, 10, -100, h[[1]]$handle) peek(fheap)

In all of these scenarios neither the key nor the value need to be unique, because the handle (node id) always is.

As a last example, consider this use case:

library('purrr') bheap <- binomial_heap("character", "integer") bheap <- insert(bheap, letters[c(2:6, 5, 5)], c(2:6, 5L, 7L)) bheap <- insert(bheap, "x", 1:3) bheap <- insert(bheap, "z", 1:3) peek(bheap) vector.keys <- handle(bheap, value = 1:3) %>% purrr::map_chr(.f = function(x) x$key) vector.keys decrease_key(bheap, from=vector.keys, to=c("a", "b")) peek(bheap) hand <- handle(bheap, key = "b") hand decrease_key(bheap, from="b", to="a", handle=hand[[2]]$handle) pop(bheap) pop(bheap)

Maps use hash functions to store key-value pairs. Given a key, the hash function computes a *bucket* where the value is stored, making lookup, insertion and deletion on average possible in $\mathcal{O}(1)$ time. In the worst case these runtimes decrease to $\mathcal{O}(n)$ time, where $n$ is the number of entries in a bucket. The difference between the three classes is that hashmaps and multimaps compute a mapping from keys to values (the hash function)
$$ \, f: \mathcal{K} \rightarrow \mathcal{V} \, $$
in **'one direction'**, while bimaps also compute a mapping
$$ \, f: \mathcal{V} \rightarrow \mathcal{K} \, $$
in the **'other direction'**. So a bimap is essentially a combination of two hashmaps. The difference between a hashmap and a multimap is that for hashs only *one* unique key can be stored for a value, while multimaps allow storing the same key several times fur multiple values,

Maps can get created like this, this time we use `<character, integer>`

pairs:

hash <- hashmap("character", "integer") bimap <- bimap("character", "integer") mm <- multimap("character", "integer")

Insertion to maps works just like before, by calling the `insert`

function:

keys <- paste0("V", 1:5) values <- 1:5 hash <- insert(hash, keys[1:4], values[1:4]) bimap <- insert(bimap, keys[1:4], values[1:4]) mm <- insert(mm, keys[1:4], values[1:4]) hash[keys[5]] <- values[5] bimap[keys[5]] <- values[5] mm[keys[5]] <- values[5] mm[keys[5]] <- values[1:5]

Values (and keys in the case of bimaps) can then be accessed like this:

get(hash, keys[1]) hash[keys[1]] get(bimap, keys[1], "values") get(bimap, values[2], "keys") get(bimap, keys[1]) get(mm, keys[5]) mm[keys[5]]

For hashmaps\multimaps the subset-operator `[`

is also supported for accessing elements. This does not make so much sense for bimaps, because here we have to choose whether we want to access keys or values. In the example on the top we showed how both is possible by either providing `keys`

or `values`

as third argument to the `get`

function. If you leave it out, then the third argument defaults to `values`

.

If you want to directly access `keys`

or `values`

as vectors or have a look at a few random elements, you would call:

```
keys(bimap)
values(hash)
head(bimap)
```

Queues and stackes are two list datastructures using the `STL`

's `std::deque`

in the backend, i.e. insertion at the end and getting the first element can be done in constant time $\mathcal{O}(1)$ . Queues and stacks are for example used in *depths-first-search* and *breadth-first-search* making them versatile datastructures. While queues use the *first-in-first-out* principle, stacks use the *last-in-first-out* principle.

The two datatypes use the exact same methods. You can instantiate a stack or a queue using:

qu <- queue("numeric") st <- stack("numeric")

As before, you can pick either `numeric`

, `integer`

or `character`

keys. Now, let's again insert some data:

keys <- seq(0, 1, by=.2) print(keys) qu <- insert(qu, keys) st <- insert(st, keys)

**BEWARE:** If you add *one* `vector`

to queues and stacks it will add *one* element:

peek(qu) peek(st)

If you want to add *multiple* elements, you need to add a list:

st <- insert(st, list(1, rnorm(5))) pop(st) pop(st)

The `pop`

operation removed the *head* element from the data structure.

For a `queue`

`pop`

and `peek`

return the *first* element that got inserted, while a stack returns the *last* element that was inserted.

- Simon Dirmeier [email protected]

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.