make_blocks: Block-wise split data into training and testing

Description Usage Arguments Value Author(s) Examples

Description

Creates a stratum vector based on a data.frame with n columns. If the data.frame has one column strata are created based on clusters separated by quantiles. If the data.frame has two or more columns, strata ere created based on k-medoid clusters (function 'pam' from package cluster). Instead of a data.frame also the argument 'npoints' can be provided, then groups are created by random sampling. An opitimization algorithm (function 'gridSearch' from package NMOF) optimizes for equal stratum sizes.

Usage

1
2
make_blocks(nstrat = 4, df = data.frame(), nclusters = nstrat * 5,
  npoints = NA, pres = numeric())

Arguments

df

data.frame with n columns containing critera for cluster building. Not necessary if argument npoints is supplied

nclusters

number of clusters based on which strata should be built. Minimum the same number as starta, maxuimum nrow(df)/10

npoints

optional argument if 'df' is not supplied. For how many points should random sampling be made?

nstrata

number of approximately equal-sized classes to separate groups in block-cross validation

Value

Vector of length nrow(df) or npoints, with integers representing different strata

Author(s)

Philipp Brun

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
### Test out block generation function

# No layers supplied
strt.1=make_blocks(npoints=nrow(obs.sel))
table(strt.1)

# Stratified by 1d layer a
strt.2=make_blocks(df=obs.sel[,2,drop=F],nstrat=5,nclusters=5)
table(strt.2)

# Stratified by 1d layer b
strt.3=make_blocks(df=obs.sel[,2,drop=F],nstrat=5,nclusters=15)
table(strt.3)

# Stratified by 2d layer a
strt.4=make_blocks(df=obs.sel[,c("bio_01","bio_03")],nstrat=3,nclusters=3)
table(strt.4)

# Stratified by 2d layer b
strt.5=make_blocks(df=obs.sel[,c("bio_01","bio_03")],nstrat=5,nclusters=15)
table(strt.5)

# Stratified by 3d layer
strt.6=make_blocks(df=obs.sel[,c("bio_01","bio_03","forest_fraction")],nstrat=5,nclusters=15)
table(strt.6)

par(mfrow=c(3,2))
plot(obs.sel[,c(2,3)],col=strt.1)
plot(obs.sel[,c(2,3)],col=strt.2)
plot(obs.sel[,c(2,3)],col=strt.3)
plot(obs.sel[,c("bio_01","bio_03")],col=strt.4)
plot(obs.sel[,c("bio_01","bio_03")],col=strt.5)
plot(obs.sel[,c("bio_01","bio_03")],col=strt.6)

filBe87/PKUss documentation built on June 29, 2019, 12:12 a.m.