Class “transactions” — Binary Incidence Matrix for Transactions
transactions class represents transaction data used for
mining itemsets or rules. It is a direct extension of class
itemMatrix to store a binary incidence
matrix, item labels, and optionally transaction IDs and user IDs.
Transactions can be created by coercion from lists containing transactions, but also from matrix and data.frames. However, you will need to prepare your data first. Association rule mining can only use items and does not work with continuous variables.
For example, an item describing a person (i.e., the considered object called a transaction) could be tall. The fact that the person is tall would be encoded in the transaction containing the item tall. This is typically encoded in a transaction-by-items matrix by a
TRUE value. This is why
as.transaction can deal with logical columns, because it assumes the column stands for an item. The function also can convert columns with nominal values (i.e., factors) into a series of binary items (one for each level). So if you have nominal variables then you need to make sure they are factors (and not characters or numbers) using something like
data[,"a_nominal_var"] <- factor(data[,"a_nominal_var"]).
Continuous variables need to be discretized first. An item resulting from discretization might be age>18 and the column contains only
FALSE. Alternatively it can be a factor with levels age<=18, 50=>age>18 and age>50. These will be automatically converted into 3 items, one for each level. Have a look at the function
discretize for automatic discretization.
Complete examples for how to prepare data can be found in the man pages for
Transactions are represented as sparse binary matrices of class
itemMatrix. If you work with several transaction sets at the
same time, then the encoding (order of the items in the binary matrix) in the different sets is important.
itemCoding to learn how to encode and recode transaction sets.
Objects from the Class
Objects are created by coercion from objects of other classes
(see Examples section) or by
calls of the form
a data.frame with one row per transaction (each transaction is considered an itemset). The data.frame can hold columns with additional information, e.g., transaction IDs or user IDs for each transaction. Note: this slot is inherited from class
itemMatrix, but should be accessed in transactions with the method
object of class
ngCMatrixto store the binary incidence matrix (see
a data.frame to store item labels (see
signature(from = "matrix", to = "transactions"); produces a transactions data set from a binary incidence matrix. The row names are used as item labels and the column names are stores as transaction IDs.
signature(from = "transactions", to = "matrix"); coerces the transactions data set into a binary incidence matrix.
signature(from = "list", to = "transactions"); produces a transactions data set from a list. The names of the items in the list are used as item labels and the item IDs and the incidence matrix is produced automatically.
signature(from = "transactions", to = "list"); coerces the transactions data set into a list of transactions. Each transaction is a vector of character strings (names of the contained items).
signature(from = "data.frame", to = "transactions"); recodes the data frame containing only categorical variables (factors) or logicals all into a binary transaction data set. For binary variables only TRUE values are converted into items and the item label is the variable name. For factors, a dummy item for each level is automatically generated. Item labels are generated by concatenating variable names and levels with "=". The original variable names and levels are stored in the itemInfo data frame as the components
levels. Note that
NAsare ignored (i.e., do not generate an item).
signature(from = "transactions", to = "data.frame"); represents the set of transactions in a printable form as a data.frame. Note that this does not reverse coercion from data.frame to
signature(from = "ngCMatrix", to = "transactions"); Note that the ngCMatrix needs to have the items as rows!
- dimnames, rownames, colnames
signature(x = "transactions"); returns row (transactionID) and column (item) names.
signature(x = "transactions"); returns the labels for the itemsets in each transaction (see
signature(x = "transactions"); replaces the transaction information with a new data.frame.
signature(x = "transactions"); returns the transaction information as a data.frame.
signature(object = "transactions")
signature(object = "transactions")
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
## example 1: creating transactions form a list a_list <- list( c("a","b","c"), c("a","b"), c("a","b","d"), c("c","e"), c("a","b","d","e") ) ## set transaction names names(a_list) <- paste("Tr",c(1:5), sep = "") a_list ## coerce into transactions trans1 <- as(a_list, "transactions") ## analyze transactions summary(trans1) image(trans1) ## example 2: creating transactions from a matrix a_matrix <- matrix(c( 1,1,1,0,0, 1,1,0,0,0, 1,1,0,1,0, 0,0,1,0,1, 1,1,0,1,1 ), ncol = 5) ## set dim names dimnames(a_matrix) <- list(c("a","b","c","d","e"), paste("Tr",c(1:5), sep = "")) a_matrix ## coerce trans2 <- as(a_matrix, "transactions") trans2 inspect(trans2) ## example 3: creating transactions from data.frame a_df <- data.frame( age = as.factor(c(6, 8, NA, 9, 16)), grade = as.factor(c("A", "C", "F", NA, "C")), pass = c(TRUE, TRUE, FALSE, TRUE, TRUE)) ## note: factors are translated differently to logicals and NAs are ignored a_df ## coerce trans3 <- as(a_df, "transactions") inspect(trans3) as(trans3, "data.frame") ## example 4: creating transactions from a data.frame with ## transaction IDs and items a_df3 <- data.frame( TID = c(1,1,2,2,2,3), item=c("a","b","a","b","c", "b") ) a_df3 trans4 <- as(split(a_df3[,"item"], a_df3[,"TID"]), "transactions") trans4 inspect(trans4)