sstl_mr: Apply sstl routine to dataset saved on HDFS

Description Usage Arguments Author(s) Examples

Description

Input raw text file on HDFS is the original input dataset. After a series of MapReudce jobs, the final fitting results are saved as output_bymth and output_bystat subdirectory inside of output path on HDFS.

Usage

1
2
sstl_mr(input, output, stat_info, mlcontrol = spacetime.control(),
  clcontrol = mapreduce.control())

Arguments

input

The input path of raw text file on HDFS

output

The output path of final fitting results on HDFS.

stat_info

The RData on HDFS which contains all station metadata. Make sure copy the RData of station_info to HDFS first using rhput.

mlcontrol

Should be a list object generated from spacetime.control function. The list including all necessary smoothing parameters of nonparametric fitting.

clcontrol

Should be a list object generated from mapreduce.control function. The list including all necessary Rhipe parameters and also user tunable MapReduce parameters.

Author(s)

Xiaosu Tong

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
## Not run: 
    mcontrol <- spacetime.control(
      vari="tmax", n=576, n.p=12, stat_n=7738,
      s.window=13, t.window = 241, degree=2, span=0.015, Edeg=2
    )
    ccontrol <- mapreduce.control(
      libLoc= NULL, reduceTask=169, io_sort=512, BLK=128, slow_starts = 0.5,
      map_jvm = "-Xmx200m", reduce_jvm = "-Xmx200m",
      map_memory = 1024, reduce_memory = 1024,
      reduce_input_buffer_percent=0.4, reduce_parallelcopies=10,
      reduce_merge_inmem=0, task_io_sort_factor=100,
      spill_percent=0.9, reduce_shuffle_input_buffer_percent = 0.8,
      reduce_shuffle_merge_percent = 0.4
    )
    sstl_mr(
      input = "/tmp/tmax.txt", output = "/tmp/output",
      stat_info = "/tmp/station_info.RData", mlcontrol = mcontrol,
      clcontrol = clcontrol
    )

## End(Not run)

XiaosuTong/drSpaceTime documentation built on May 9, 2019, 11:06 p.m.