df2SPMFSequence: Sequence creating function

Description Usage Arguments Value Examples

Description

This function is called directly by the user to create sequences from a dataframe, compliant with SPMF formats. It calls the functions ToSequence or ToTimedSequence. It outputs a list with two elements : toSendSPMF contains a variable sequence respecting the input format for SPMF frequent sequence mining, with or without time. The second element, evLev is the matching table to the original item names.

Usage

1
2
df2SPMFSequence(df, ID, itemset = "", event = "", time = "",
  timeFormat = "", timestep = 1, parallel = F, timeUnit = "auto")

Arguments

df

a data frame from which to create sequences

ID

the name of the column of IDs, the sequences are built for a given ID

itemset

the name of the column of itemsets, that is of the product bought together. You need to provide at least one of itemset or time parameters

time

the name of the column where the time of an event is stored.You need to provide at least one of itemset or time parameters

timeFormat

the format in which the time column is encoded (example "%d-%m-%Y") If provided df2SPMFBasket will assume you want time to be taken into account. To build the proper format, please refer the man page of strptime (via ?strptime)

timestep

an integer by witch you can divide the time at which an event occurs in a sequence. If your times are expressed in days, setting timestep to 7 will express this delay in weeks, grouping de facto all items of the same week (slideing 7 days from the first item)

parallel

if TRUE, then the function will use all the cores of your system and parallelize the creation of your baskets. Default is F because the gain depends on the number of cores and the length of the dataframe

timeUnit

the time unit in which time diff will be rendered in timed sequences.

Value

df2SPMFSequence returns a list. The toSendSPMF element contains a tibble/dataframe whose slot sequence contains all the sequences in the proper format to export them to a txt file readable by the spmf java library

Examples

1
2
3
seqDF is a dataframe to test the functions. It contains the variables ID, jour, ITEMSETS and PRODUITSnum to be used as an example.
test<-df2SPMFSequence(seqDF,ID="ID",time="jour",event="PRODUITSnum",itemset="ITEMSETS",
timeFormat="\%d",parallel = F)

MGousseff/r2spmf documentation built on May 26, 2019, 11:58 p.m.