Package kaya
is a set of R tools for managing large collections of Go games stored in the Smart Game Format (SGF) plaintext markup language. The name comes from kayawood, which in Japan is considerd one of the best materials to make Go boards from.
A standard SGF might take the following form:
(;GM[1] FF[4] SZ[19] PW[supertjc] WR[7d] PB[High55] BR[6d] DT[2009-09-01] PC[The KGS Go Server at http://www.gokgs.com/] KM[0.50] RE[B+Resign] RU[Japanese] OT[5x10 byo-yomi] CA[UTF-8] ST[2] AP[CGoban:3] TM[0] HA[2] AB[pd][dp] ;W[pm];B[df];W[pq];B[kq];W[qf];B[nc];W[ch];B[qe];W[pf];B[pi];W[ne];B[mi];W[mc];B[mb];W[nb];B[ob];W[nd];B[na];W[oc];B[pc];W[nb];B[lc];W[nc];B[qb];W[lb];B[ma];W[kb];B[ld];W[jd];B[lf];W[ri];B[hd];W[ib];B[hb];W[jf];B[mg];W[re];B[of];W[hf];B[fd];W[cd];B[bf];W[ee];B[gf];W[ed];B[hg];W[ig];B[he];W[if];B[gg];W[fc];B[gc];W[fe];B[ge];W[fb];B[eg];W[ic];B[pe];W[cl];B[ii];W[gb];B[cj];W[el];B[di];W[fq];B[fp];W[gp];B[fo];W[dq];B[cq];W[er];B[dr];W[eq];B[cp];W[iq];B[cr];W[mq];B[ko];W[bj];B[bk];W[cn];B[go];W[hp];B[in];W[ck];B[bi];W[dj];B[ci];W[bl];B[aj];W[ej];B[bo];W[hk];B[jr];W[ir];B[gr];W[gq];B[gl];W[kp];B[lp];W[jp];B[lq];W[lo];B[mp];W[nq];B[kn];W[ln];B[op];W[km];B[io];W[jq];B[oq];W[or];B[pr];W[nr];B[lm];W[mm];B[ll];W[mn];B[qq];W[np];B[pp];W[no];B[fk];W[gm];B[hl];W[hm];B[im];W[il];B[gk];W[fm];B[jl];W[ik];B[gi];W[en];B[do];W[dm];B[rm];W[qo];B[ro];W[rp];B[qn];W[rd];B[rj];W[rh];B[qj];W[rb];B[rc];W[sc];B[qc];W[qa];B[oa];W[ph];B[oe];W[oi];B[og];W[oh];B[hc];W[ha];B[mk];W[ji];B[jj];W[ki];B[qi];W[qh];B[pg];W[qg];B[sd];W[se];B[si];W[ok];B[oj];W[nj];B[pj];W[ni];B[nk];W[mj];B[lj];W[mh];B[li];W[lh];B[ng];W[nh];B[kj];W[lg];B[kf];W[kg];B[jh];W[kh];B[ih];W[jg];B[on];W[pn];B[oo];W[om];B[bn];W[bm];B[dn];W[jk];B[kl];W[rl];B[qp];W[qm];B[rn];W[sm];B[po];W[so];B[rq];W[nl];B[pk];W[ol];B[ql];W[pl];B[rk];W[qk];B[bd];W[bc];B[ql];W[mf];B[me];W[qk];B[ce];W[cc];B[ql];W[ke];B[nf];W[qk];B[sb];W[sa];B[ql];W[sl];B[sp];W[qk];B[ia];W[ja];B[ql];W[ho];B[hn];W[qk];B[ga];W[ql];B[ia];W[ad];B[ha];W[la];B[md];W[be];B[fa];W[bg];B[af];W[fg];B[ff];W[fh];B[ef];W[dh];B[eh];W[ei];B[bh];W[db];B[dd])
The markup represents keys as two-character capital letters, with matched values in square brakets immediately following the key. Semicolons are basically like line breaks, and the whole game record is represented by the rounded brackets at the start and finish.
Game moves appear at the bottom and are stored by the one-letter stone color (B for black and W for white), followed by "sgf coordinates", which show the row, then column, of the board as letters from coordinate aa
in the top left corner.
The read_sgf
function takes this format and transforms it into a named list on R.
my_game <- read_sgf("./inputs/2009-09-01-1.sgf") str(my_game)
Now it stores each tag as a list of named strings, e.g. my_game$DT
prints the date, r my_game$DT
. Personally I am not committed to maintaining the SGF key codes in the R object, which I find unhelpfully cryptic, and I will probably phase them out in the next version, in favor of more explicit names. We have added a few variables to this list as well, after loading it:
filename
indicates where the sgf is located, relative to the working directorykaya_notes
will indicate if any processing was done by the kaya package, e.g. the board was rotated into standard orientationhash_id
is a unique identifier for the game based entirely on the move sequence translated into standard orientation. If two games are identical by move, even with different metadata, the hash_id will be the same. This is true even if arbitrary rotations applied thanks to kaya's use of standard orientationn_moves
stores the number of played moves (ignoring setup stones, like in a handicap).In addition, we've added a subtable called moves
which includes all the information about the move string itself. The color, explicit move number, the board column and board row. Also comments embedded in the sgf are stored there. Setup stones, e.g. from a handicap, are stored at the beginning of this table all labelled as "move 0". They are also found by the tags AB
and AW
for black and white setup stones, respectively.
Each Go game is stored inside R as a named list. We can load many such with with an lapply call on read_sgf
:
my_games <- list.files("./inputs", pattern = "*.sgf$", full.names = TRUE) game_data <- lapply_pb(my_games, read_sgf) contents <- unlist(lapply(game_data, length))
We can then turn this into a set of relational tables like so
library(jsonlite) d_all <- game_data target_entries <- 1:length(d_all) d_chunk <- d_all[target_entries] dj <- toJSON(d_chunk) chunk_json <- paste("./json/test.json", sep="") writeLines(dj, chunk_json) d <- read_json(chunk_json, simplifyVector=TRUE) moves <- d$moves m1 <- rep(NA, length(moves)) m2 <- rep(NA, length(moves)) m3 <- rep(NA, length(moves)) for(j in 1:length(moves)){ if(nrow(moves[[j]])>0 & !is.null(moves[[j]]$column)){ m1[j] <- paste0( letters[moves[[j]]$column[1]], letters[moves[[j]]$row[1]] ) m2[j] <- paste0( letters[moves[[j]]$column[2]], letters[moves[[j]]$row[2]] ) m3[j] <- paste0( letters[moves[[j]]$column[3]], letters[moves[[j]]$row[3]] ) } } d <- d[,!colnames(d) %in% c("moves", "AB", "AW")] # apply(d, 2, function(z) table(unlist(lapply(z, length)))) library(yamltools) d <- vectorize(d) d$m1 <- m1 d$m2 <- m2 d$m3 <- m3 chunk_csv <- gsub("json", "csv", chunk_json) write.csv(d, chunk_csv, row.names=FALSE) print(chunk_csv)
This will read in game_data
as an unnamed list, each element being the output of read_sgf
for that file.
Unlike Chess, the Go board at the start of the game is completely symmetrical; you can apply any arbitrary rotation and it is the "same" game. In other words, it doesn't matter which side of the board you consider the "bottom".[^note] This introduces a problem in storing collections of games, because the games might contain the same moves but be transformed in some way, e.g. rotated 90 degrees clockwise, or reflected about a diagonal or central line.
To avoid these problems in the databasing, I introduce the concept of standard orientation in storing Go games, which holds that the first non-diagonal, non-axis move must occur in sector 1 in the following diagram. If it does not, apply the needed transformations to all the moves to make it appear in sector 1, and that is "standard orientation" for this set of moves.
library(kaya) plot_board(goban.color="white", line.color=gray(0.4)) abline(0,1, lwd=2) abline(20,-1, lwd=2) abline(v=10, lwd=2) abline(h=10, lwd=2) text(16, 12, "sector 1", cex=3) text(13, 17, "sector 2", cex=3) text(7, 17, "sector 3", cex=3) text(4, 12, "sector 4", cex=3) text(4, 8, "sector 5", cex=3) text(7, 3, "sector 6", cex=3) text(13, 3, "sector 7", cex=3) text(16, 8, "sector 8", cex=3)
Moves on the diagonals are not counted as within these sectors (technically they are all in "sector 0").
[^note]: at least, I don't think it matters. It is not standard to record where each player was sitting in a real board, but convention holds that the player start in the top right corner relative to their point of view. At any rate, this is not stored in the SGF file.
We can plot a game like so:
plot_game(game_data[[1]])
There are a number of options in the plot_game
function that allow for more kifu-like figures.
plot_game(game_data[[1]], number=TRUE, stop=50, goban.color = gray(0.9))
Here the handicap stones are "move 0", which is weird I should really remove that.
write_gif(game_data[[1]], "./test.gif", stop=50)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.