cvsegments | R Documentation |
The function generates a list of segments for cross-validation. It can generate random, consecutive and interleaved segments, and supports keeping replicates in the same segment.
cvsegments(
N,
k,
length.seg = ceiling(N/k),
nrep = 1,
type = c("random", "consecutive", "interleaved"),
stratify = NULL
)
N |
Integer. The number of rows in the data set. |
k |
Integer. The number of segments to return. |
length.seg |
Integer. The length of the segments. If given, it
overrides |
nrep |
Integer. The number of (consecutive) rows that are replicates of the same object. Replicates will always be kept in the same segment. |
type |
One of |
stratify |
Either a |
If length.seg
is specified, it is used to calculate the number of
segments to generate. Otherwise k
must be specified. If
k*length.seg \ne N
, the k*length.seg - N
last
segments will contain only length.seg - 1
indices.
If type
is "random"
, the indices are allocated to segments in
random order. If it is "consecutive"
, the first segment will contain
the first length.seg
indices, and so on. If type
is
"interleaved"
, the first segment will contain the indices 1,
length.seg+1, 2*lenght.seg+1, \ldots, (k-1)*length.seg+1
, and so on.
If nrep >
, it is assumed that each nrep
consecutive rows are
replicates (repeated measurements) of the same object, and care is taken
that replicates are never put in different segments.
Warning: If k
does not divide N
, a specified length.seg
does not divide N
, or nrep
does not divide length.seg
,
the number of segments and/or the segment length will be adjusted as needed.
Warnings are printed for some of these cases, and one should always inspect
the resulting segments to make sure they are as expected.
Stratification of samples is enabled by the stratify
argument. This
is useful if there are sub-groups in the data set that should have a
proportional representation in the cross-validation segments or if the
response is categorical (classifiation). If stratify
is combined with
nrep
, stratify
corresponds to the sets of replicates (see
example).
A list of vectors. Each vector contains the indices for one
segment. The attribute "incomplete"
contains the number of
incomplete segments, and the attribute "type"
contains the type of
segments.
Bjørn-Helge Mevik, Ron Wehrens and Kristian Hovde Liland
## Segments for 10-fold randomised cross-validation:
cvsegments(100, 10)
## Segments with four objects, taken consecutive:
cvsegments(60, length.seg = 4, type = "cons")
## Incomplete segments
segs <- cvsegments(50, length.seg = 3)
attr(segs, "incomplete")
## Leave-one-out cross-validation:
cvsegments(100, 100)
## Leave-one-out with variable/unknown data set size n:
n <- 50
cvsegments(n, length.seg = 1)
## Data set with replicates
cvsegments(100, 25, nrep = 2)
## Note that rows 1 and 2 are in the same segment, rows 3 and 4 in the
## same segment, and so on.
## Stratification
cvsegments(10, 3, type = "consecutive", stratify = c(rep(1,7), rep(2,3)))
## Note that the last three samples are spread across the segments
## according to the stratification vector.
cvsegments(20, 3, type = "consecutive", nrep = 2, stratify = c(rep(1,7), rep(2,3)))
## Note the length of stratify matching number of replicate sets, not samples.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.