GrpString-package: Patterns and Statistical Differences Between Two Groups of...

Description Details Author(s) Examples


Methods include converting series of event names to strings, finding common patterns in a group of strings, discovering featured patterns when comparing two groups of strings as well as the number and starting position of each pattern in each string, obtaining transition matrix, computing transition entropy, statistically comparing the difference between two groups of strings, and clustering string groups.

Event names can be any action names or labels such as events in log files or areas of interest (AOIs) in eye tracking research.


Package: GrpString
Type: Package
Version: 0.3.2
Date: 2017-08-15
License: GPL-2

Some functions have two or more types, e.g., one returning a data frame or a vector and the other exporting one or more than one .txt file to the current directory. The former is a simple version of the functions, while the latter can be considered as a generalized or complex version of the former one. This is because some data sets are large (e.g., many rows or columns), or it helps the users to view and manage the results when more than one data set is exported. Example function pairs are EveStr - EveString, CommonPatt - CommonPattern, and PatternInfo - FeaturedPatt.

In addition, to save the users' effort, the function EveString utilizes an input file (which can be a .txt or .csv file) instead of a data frame. This is because the input data are more convenient to be stored in a .txt or .csv file than in a data frame. We suggest that the users copy the relevant input files (including eve1d.txt and eve1d.csv) to a different directory, because the function exports files to the same directory where the input files locate.


Hui Tang, Norbert J. Pienta

Maintainer: Hui (Tom) Tang <>


# Discover common patterns in a group of strings
strs.vec <- c("ABCDdefABCDa", "def123DC", "123aABCD", "ACD13", "AC1ABC", "3123fe")
CommonPatt(strs.vec, low = 30)

Example output

   Pattern Freq_grp Percent_grp Length Freq_str Percent_str
33    ABCD        3      50.00%      4        2      33.33%
32     ABC        4      66.67%      3        3      50.00%
1      123        3      50.00%      3        3      50.00%
50     BCD        3      50.00%      3        2      33.33%
85     def        2      33.33%      3        2      33.33%

GrpString documentation built on May 2, 2019, 12:38 p.m.