The `astrid`

R-package, short for Automatic STRucture IDentification, provides an implementation of the method described in

*Henelius, Andreas, PuolamÃ¤ki, Kai and Ukkonen, Antti. Finding Statistically Significant Attribute Interactions. 2016*, available from arXiv.

The basic idea is to use classifiers to investigate class-dependent attribute interactions in datasets.

To get a BibTex entry in R type `citation("astrid")`

when the package is installed.

The development version of the `astrid`

package can be installed from GitHub as follows.

First install the `devtools`

-package and load it:

```
install.packages("devtools")
library(devtools)
```

You can now install the `astrid`

package:

```
install_github("bwrc/astrid-r")
```

This is a short example demonstrating use of the library. We here analyse the following synthetic dataset:

The dataset has two classees, each with 500 samples. The data is generated so that attributes a1 and a2 must be used jointly to predict the class (leftmost panel), while attribute a3 carries some (weak) class information (middle panel). Attriubte a4 (rightmost panel) is just noise. The known class-dependent attribute interaction structure is hence given by *((a1, a2), (a3), (a4))*.

```
## Load the library
library(astrid)
library(e1071)
library(randomForest)
## Create a synthetic dataset with the known
## attribute interaction structure
## ((a1, a2), (a_3), (a_4)), where attribute a_4 is just noise.
dataset <- make_synthetic_dataset(N = 500, seed = 42, mg2 = 0.6)
## Perform the analysis using the ASTRID algorithm
res <- analyze_dataset(dataset, classname = "class", classifier = "svm", parallel = TRUE, R = 250)
## Print the results as an HTML table
print_result_table_html(res, full_tree = TRUE)
```

This gives the following results for the analysis of the synthetic dataset using the SVM classifier:

k acc p a3 a4 a2 a1 2 0.89 0.71 (A) (B B B) 3 0.88 0.78 (A) (B) (C C) 4 0.73 0.00 (A) (B) (C) (D)In this table *k* is the size (cardinality) of the grouping, *acc* is the average accuracy of the classifier when trained using a dataset randomised using this grouping, and *p* is the statistical significance of the grouping. The following columns each denote one attribute (here *a4*, *a3*, *a1* and *a2*.). At each row, attributes marked with the same letter belong to the same group.

This shows that the maximum-cardinality grouping with a p-value of at least 0.05 is for k = 3, where the grouping is *((a1, a2), (a3), (a4))*. The structure found by the ASTRID algorithm matches the model used to create the data.

The `astrid`

R-package is licensed under the MIT-license.

bwrc/astrid-r documentation built on June 24, 2017, 8:05 p.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.