In choosunsick/kbodatamining: Kbo Data Mining and Analysis with R

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-"
)

`kbodatamining`

Package Introduction

The goal of kbodatamining package is analizing and visualizing korean baseball data(KBO data). This package is intended to make use of KBO data collected from Python to R.

Package Installation

You can install package kbodatamining from github with the following command.

# install.packages("devtools")
devtools::install_github("choosunsick/kbodatamining")

Package Functionality

There are two primary functions in this package. The first is to make the records of players box-score. for example:

library(kbodatamining)

# batter example
batter_boxscore(data = hanhwa_batter_2018,name = "이용규",yearly = 2018)
# pitcher example
pitcher_boxscore(data = hanhwa_pitcher_2018,name = "정우람",yearly = 2018)
# team hitting example
team_batter_boxscore(data = hanhwa_batter_2018,teamname = "한화",yearly = 2018)
# team pitching example
team_pitcher_boxscore(data = hanhwa_pitcher_2018,teamname = "한화",yearly = 2018)

The second is to visualize monthly or seasonal comparisons for each player. In addition, these two functions can be applied to each team. for example:

library(kbodatamining)
theme_set(theme_gray(base_family = "AppleGothic"))
# comparing batters
oneseason_batter_compare(data = hanhwa_batter_2018,playername1 = "이용규",playername2 = "정근우",yearly=2018)
# comparing pitchers
oneseason_pitcher_compare(data = hanhwa_pitcher_2018,playername1 = "정우람",playername2 = "송은범",yearly=2018)
# comparing batters of team
compare_batter_statistic_teams(hanhwa_batter_2018,"한화","LG",yearly=2018)
# comparing pitchers of team
compare_pitcher_statistic_teams(hanhwa_pitcher_2018,"한화","LG",yearly=2018)

You can find a detailed explanation of this function in the 'plotting example' document of the vignette folder.

Notes on data

How to collect KBO data from Python can be found at LINK. For reference, the sample data in the package includes the data of 2018 season Hanhwa team. This sample data is divided into 'hanhwa_batter_2018', 'hanhwa_pitcher_2018'. You can check the package's functions using sample data.

Json data processing (json data 처리과정)

If you see the link above and get a json file for the KBO game through Python, you will have to process the json file to use it in R. So I'll show you an example of loading a json file from R and making it to DATAFRAME. An example of the process will be shown through the data in the json_sample folder in this github. for example:

library(jsonlite)
library(kbodatamining)
# read json file
sample_set <- fromJSON("./json_sample/Hanhwa_normalseason_2018.json")
# make batter data dataframe 
# If you want pitch data, use json2pitcherdf instead of json2batterdf
hanhwa_batter_2018 <- do.call(rbind,lapply(names(sample_set),FUN = function(x)json2batterdf(sample_set,x)))
# organizing column order & column name
hanhwa_batter_2018 <- hanhwa_batter_2018[,c(18:21,11,16:17,1:9,24,23,22,13,12,10,15)]
colnames(hanhwa_batter_2018) <- c('date','away','home','doubleheader','name','team','position',
                                  'one','two','three','four','five','six','seven','eight','nine',
                                  'ten','eleven','twelve','ab','h','r','rbi')
head(hanhwa_batter_2018)

In addition, you need to create an id column of the players.

library("kbodatamining")
# read id list 
batter_id  <- read.csv("./player_id_list/batter_id_list.csv",stringsAsFactors = F)
# extract the appropriate id list for your data
batter_id_2018 <- batter_id[substr(batter_id$date,1,4)==2018,]
batter_id_hanhwa_2018 <- batter_id_2018[batter_id_2018$away=="한화"|batter_id_2018$home=="한화",]
# make id column 
hanhwa_batter_2018 <- get_player_id(hanhwa_batter_2018,batter_id_hanhwa_2018)
head(hanhwa_batter_2018)