Description Usage Arguments Details Value Author(s) References Examples

Function to perform random intersection trees. When two binary data matrices `z`

(class 1) and `z0`

(class 0) are supplied, it searches for interactions. More precisely, since the data matrices are binary, each row of each matrix can be represented by the set of column indices with non-zero entries. The function searches for sets (interactions) that are more prevalent in class 1 than class 0, and then sets that are more prevalent in class 0 than class 1. When given a single binary matrix `z`

with the argument `z0`

omitted, the function simply finds sets with high prevalence. Prevalences of interactions returned are estimated using min-wise hashing.

1 2 3 |

`z` |
data matrix where each row corresponds to an observation and columns correspond to variables. Can be in sparse matrix format (inherit from class "sparseMatrix" in the Matrix package). |

`z0` |
optional second data matrix with the same number of columns as |

`branch` |
average number of branches to use when creating each tree. |

`depth` |
maximum depth of trees. |

`n_trees` |
number of trees to be constructed. |

`theta0` |
when searching for sets of variables that are more prevalent in class 1 than class 0, the maximum threshold for prevalence in class 0. |

`theta1` |
as above but with class 1 and class 0 interchanged. |

`min_inter_sz` |
minimum size of the interactions to be returned |

`L` |
number of rows of the min-wise hash matrix used to estimate prevalences. A larger value will result in more accurate estimates, but computation time will increase linearly with |

`n_cores` |
number of cores for parallel processing. Only used when openMP is installed. |

`output_list` |
if |

There are two tasks which can be performed with this function depending on whether or not `z0`

is supplied (note `z`

must always be supplied).

1. If `z0`

is omitted, the function finds prevalent sets in `z`

and `theta0`

and `theta1`

are ignored.

2. If `z0`

is supplied, it searches for sets that are prevalent in `z`

but have prevalence at most `theta0`

in `z0`

. Next sets that are prevalent in `z0`

but have prevalence in `z`

at most `theta1`

are found.

If `output_list`

is `FALSE`

(the default), the output is either a data frame (if `z0`

is omitted) or list of two data frames (if `z0`

is supplied). The data frames have first column a character vector of interaction sets with the variables in the sets separated by spaces, and second column the estimated prevalences. When `z0`

is supplied, the interactions in the first component of the list named `Class1`

are those which are prevalent in `z`

and their prevalences in `z`

are reported. The second component named named `Class0`

contains those interactions prevalent in `z0`

and their prevalences in `z0`

.

When `output_list`

is `TRUE`

, each interaction is reported as an integer vector and so the collection of interactions is a list of such vectors.

Hyun Jik Kim, Rajen D. Shah

Shah, R. D. and Meinshausen, N. (2014) Random Intersection Trees. *Journal of Machine Learning Research*, **15**, 629–654.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | ```
## Generate two binary matrices
z <- matrix(rbinom(250*500, 1, 0.3), 250, 500)
z0 <- matrix(rbinom(250*500, 1, 0.3), 250, 500)
## Make the first and second cols of z identical
## so the set 1, 2 has prevalence roughly 0.3 compared
## to roughly 0.09 for any other pair of columns
z[, 1] <- z[, 2]
## Similarly for z0
z0[, 3] <- z0[, 4]
## Market basket analysis
out1 <- RIT(z)
out1[1:5, ]
## Finding interactions
out2 <- RIT(z, z0)
out2$Class1[1:5, ]
out2$Class0[1:5, ]
## Can also perform the above using sparse matrices
if (require(Matrix)) {
S <- Matrix(z, sparse=TRUE)
S0 <- Matrix(z0, sparse=TRUE)
out3 <- RIT(S, S0)
}
``` |

```
Interaction Prevalence
1 1 2 0.2645503
2 214 220 0.2389937
3 145 412 0.2341772
4 79 214 0.2125000
5 60 412 0.2117647
Interaction Prevalence
1 1 2 0.2469136
2 61 412 0.2183908
3 145 353 0.2169312
4 87 497 0.2142857
5 105 211 0.2080925
Interaction Prevalence
1 3 4 0.3436426
2 218 391 0.2216216
3 237 311 0.2203390
4 174 389 0.2142857
5 352 441 0.2142857
Loading required package: Matrix
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.