Installation

To get up and running for the rain prediction kaggle competition, you can first install the kgRainPredictR package from github, via:

devtools::install_github("potterzot/kgRainPredictR")

It might be better though to just clone the git repository, since that will give you easy access to the package directory, data folder, tests, etc... If you do so, then from R studio you may want to build and reload the package by clicking on the button in Rstudio or just issuing R CMD INSTALL --no-multiarch --with-keep.source kgRainPredictR from your shell.

Getting/Loading the Data

you should first download the competition data and save it in a directory. The data-raw directory of this repository comes with a utility script to convert the data from zipped csv files to ff files. This is not strictly necessary. ffdf files may provide some speed boost, at least for initial load, but fread will allow for loading only select columns.

Once you have it in either csv or ffdf form, you can load it with: ``` {r eval=FALSE} library(ff) ffload("data/train", overwrite = TRUE) df_train <- list( data = as.data.frame(train[,c("Id", "minutes_past", "radardist_km", "Ref", "Expected")]), labels = as.data.frame(train[,c("Expected")]) )

If instead you have csv files, you can use `data.table` and `fread`:
``` {r eval=FALSE}
library(data.table)
df_train <- list(
  data = fread("data/train.csv", select = c(1,2,4))
  labels = fread("data/train.csv", select = c(24))
)

Reproducing the Sample Submission

kgRainPredictR includes marshall_palmer, a function to reproduce the data in the sample submission file. This calculates the expected rain as a function of the reflectivity, weighted by the time of measure.

``` {r eval=FALSE} library(ff) library(dplyr) library(lazyeval) library(kgRainPredictR) ffload("data/test", overwrite = TRUE, rootpath = "data/") df_test <- as.data.frame(test[,c("Id", "minutes_past", "Ref")])

get predicted values

res <- group_by(df_test, Id) %>% summarize(Expected = marshall_palmer(Ref, minutes_past))

write to submission file

create_submission(res, "submission.csv") ```



potterzot/kgRainPredictR documentation built on May 25, 2019, 11:24 a.m.