Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/random.transactions.R

Simulates a random `transactions`

object using different
methods.

1 2 | ```
random.transactions(nItems, nTrans, method = "independent", ...,
verbose = FALSE)
``` |

`nItems` |
an integer. Number of items. |

`nTrans` |
an integer. Number of transactions. |

`method` |
name of the simulation method used (default: all items occur independently). |

`...` |
further arguments used for the specific simulation method (see details). |

`verbose` |
report progress. |

The function generates a `nitems`

times `ntrans`

transaction database.

Currently two simulation methods are implemented:

- method
`"independent"`

(see Hahsler et al., 2006) -
All items are treated as independent. The transaction size is determined by

*rpois(lambda-1)+1*, where`lambda`

can be specified (defaults to 3). Note that one subtracted from lambda and added to the size to avoid empty transactions. The items in the transactions are randomly chosen using the numeric probability vector`iProb`

of length`nItems`

(default: 0.01 for each item). - method
`"agrawal"`

(see Agrawal and Srikant, 1994) -
This method creates transactions with correlated items uses the following additional parameters:

- lTrans
average length of transactions.

- nPats
number of patterns (potential maximal frequent itemsets) used.

- lPats
average length of patterns.

- corr
correlation between consecutive patterns.

- cmean
mean of the corruption level (normal distr.).

- cvar
variance of the corruption level.

The simulation is a two-stage process. First, a set of

`nPats`

patterns (potential maximal frequent itemsets) is generated. The length of the patterns is Poisson distributed with mean`lPats`

and consecutive patterns share some items controlled by the correlation parameter`corr`

. For later use, for each pattern a pattern weight is generated by drawing from an exponential distribution with a mean of 1 and a corruption level is chosen from a normal distribution with mean`cmean`

and variance`cvar`

.The patterns are created using the following function:

`random.patterns(nItems, nPats = 2000, method = "agrawal", lPats = 4, corr = 0.5, cmean = 0.5, cvar = 0.1, iWeight = NULL, verbose = FALSE)`

The function returns the patterns as an

`itemsets`

objects which can be supplied to`random.transactions`

as the argument`patterns`

. If no argument`patterns`

is supplied, the default values given above are used.In the second step, the transactions are generated using the patterns. The length the transactions follows a Poisson distribution with mean

`lPats`

. For each transaction, patterns are randomly chosen using the pattern weights till the transaction length is reached. For each chosen pattern, the associated corruption level is used to drop some items before adding the pattern to the transaction.

Returns an object of class
`transactions`

.

Michael Hahsler

Michael Hahsler, Kurt Hornik, and Thomas Reutterer (2006). Implications of
probabilistic data modeling for mining association rules. In M. Spiliopoulou,
R. Kruse, C. Borgelt, A. Nuernberger, and W. Gaul, editors, *From Data and
Information Analysis to Knowledge Engineering, Studies in Classification, Data
Analysis, and Knowledge Organization*, pages 598–605. Springer-Verlag.

Rakesh Agrawal and Ramakrishnan Srikant (1994). Fast algorithms for mining
association rules in large databases. In Jorge B. Bocca, Matthias Jarke, and
Carlo Zaniolo, editors, *Proceedings of the 20th International Conference
on Very Large Data Bases, VLDB*, pages 487–499, Santiago, Chile.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | ```
## generate random 1000 transactions for 200 items with
## a success probability decreasing from 0.2 to 0.0001
## using the method described in Hahsler et al. (2006).
trans <- random.transactions(nItems = 200, nTrans = 1000,
lambda = 5, iProb = seq(0.2,0.0001, length=200))
## size distribution
summary(size(trans))
## display random data set
image(trans)
## use the method by Agrawal and Srikant (1994) to simulate transactions
## which contains correlated items. This should create data similar to
## T10I4D100K (we just create 100 transactions here to speed things up).
patterns <- random.patterns(nItems = 1000)
summary(patterns)
trans2 <- random.transactions(nItems = 1000, nTrans = 100,
method = "agrawal", patterns = patterns)
image(trans2)
## plot data with items ordered by item frequency
image(trans2[,order(itemFrequency(trans2), decreasing=TRUE)])
``` |

```
Loading required package: Matrix
Attaching package: 'arules'
The following objects are masked from 'package:base':
abbreviate, write
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 4.000 5.000 5.029 6.000 14.000
set of 2000 itemsets
most frequent items:
item652 item461 item768 item883 item425 (Other)
62 57 57 47 45 7759
element (itemset/transaction) length distribution:sizes
1 2 3 4 5 6 7 8 9 10 11
106 284 447 441 359 195 104 36 18 5 5
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 3.000 4.000 4.013 5.000 11.000
summary of quality measures:
pWeights pCorrupts
Min. :1.780e-07 Min. :0.0000
1st Qu.:1.429e-04 1st Qu.:0.2851
Median :3.357e-04 Median :0.4953
Mean :5.000e-04 Mean :0.4945
3rd Qu.:7.015e-04 3rd Qu.:0.7012
Max. :5.027e-03 Max. :1.0000
includes transaction ID lists: FALSE
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.