readIn: Readin Raw text data files and save it as by time division on...

Description Usage Arguments Author(s) Examples

Description

Input raw text data file is download from NCDC, and is available in the drsstl package in ./inst/extdata. It is read in and divided into by-month division saved on HDFS

Usage

1
2
readIn(input, output, info, cluster_control = mapreduce.control(),
  model_control = spacetime.control(), cshift = 1)

Arguments

input

The path of input file on HDFS. It should be raw text file.

output

The path of output file on HDFS. It is by time division.

info

The RData on HDFS which contains all station metadata. Make sure copy the RData of station_info.RData, which is also available in the drsstl package, to HDFS first using rhput.

cluster_control

all parameters that are needed for mapreduce job

model_control

Should be a list object generated from spacetime.control function. The list including all necessary smoothing parameters of nonparametric fitting.

cshift

number of columns to be shifted when reading raw text file

Author(s)

Xiaosu Tong

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## Not run: 
    rhput("./station_info.RData", "/tmp/station_info.RData")
    FileInput <- "/tmp/tmax.txt"
    FileOutput <- "/tmp/bymth"
    ccontrol <- mapreduce.control(
      libLoc=NULL, reduceTask=5, io_sort=100, slow_starts = 0.5,
      reduce_input_buffer_percent=0.9, reduce_parallelcopies=5,
      spill_percent=0.9, reduce_shuffle_input_buffer_percent = 0.9,
      reduce_shuffle_merge_percent = 0.5
    )
    readIn(
      FileInput, FileOutput, info="/tmp/station_info.RData", cluster_control=ccontrol
    )

## End(Not run)

XiaosuTong/drsstl documentation built on May 9, 2019, 11:06 p.m.