suppressPackageStartupMessages({
  library(magrittr)
  library(dplyr)
  library(minilexer)
})
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Scrabble game format: gcg

The gcg file format is a human readable representation of a Scrabble game.

In its most basic form, it has comments up the top (preceded by the # character) and then the following rows represent a move by each player in turn.

An example gcg file is show below:

gcg_text <- '
#player1 Quackle Quackle Computer
#player2 David David Boys
#description Quackle Computer plays David Boys in Round 1 at the 2006 Human vs. Computer Showdown
#title 2006 Human vs. Computer Showdown Round 1
#incomplete
>Quackle: DEMJNOT  8d   JETON           +40   40
>David: ?EDYEIG   h2  rEDYEING        +64   64
>Quackle: BEDGMNP  7e   BEDIM           +26   66  BE, ET, DO
>David: HEALERS   j1  HEALERS         +75  139  BEDIMS
>Quackle: DFGINPS   k3  DIF             +29   95  AD, LI, EF
>David: COOAORS   l1  COOS            +28  167  ADO, LIS
>Quackle: EGNOPRS   m3  SPONGER         +92  187  ADOS, LISP
>David: AORWAVA  6c   AVOW            +37  204  OBE, WET
>Quackle: AEFMOVZ  8l   MEZE            +54  241
>David: AARTUNY   d8  JAUNTY          +32  236
>Quackle: ACFIOOV  1l   COOF            +27  268
>David: WALTIER  4c   WAILED          +20  256
>Quackle: AACEINV  3a   VIA             +22  290  AW
>David: IRUTRUT   a3  VIRTU            +9  265
>Quackle: AACEHLN  8a   EH              +42  332  VIRTUE
>David: QUBITUR  2b   BRUIT           +32  297  BI, RAW
>Quackle: AACILNR  9m   RAN             +16  348  ZA, EN
>David: PQUIEN? 13a   QUEY            +32  329
>Quackle: CALLIER   c13 EL               +2  350
>David: PINIR?N  1e   PIN             +11  340  PI, IT
>Quackle: ACEILOR 15a   CALORIE         +83  433  ELL
>David: TRAING? 14f   TRAdING         +67  407  TI, RE
>David:               (DATSXK)        +36  443
'

Use lex() to turn the text into tokens

  1. Start by defining the regular expression patterns for each element in the gcg file.
  2. Use minilexer::lex() to turn the gcg text into tokens
gcg_patterns <- c(
  comment       = '(#.*?)\n',                 # Assume # only appears to denote comment to end of line
  newline       = '\n',
  whitespace    = '\\s+',
  player        = '>(.*?):',                  # start of each line with a `>`
  location      = '[a-o]\\d+|\\d+[a-o]|--|-', # Number first for horizontal words. -/-- for specials
  number        = minilexer::pattern_number,
  symbol        = '[-+\\w\\./\\?\\(\\)]+',
  comma         = ","
)

tokens <- minilexer::lex(gcg_text, gcg_patterns)
tokens <- tokens[!(names(tokens) %in% c('whitespace', 'newline', 'comment'))]
tokens[1:23]


coolbutuseless/minilexer documentation built on May 14, 2019, 6:09 a.m.