gseg2_discrete: Graph-Based Change-Point Detection for Changed Interval for...

Description Usage Arguments Value See Also Examples

View source: R/gSeg_discrete.R

Description

This function finds an interval in the sequence where their underlying distribution differs from the rest of the sequence when data has repeated observations. It provides four graph-based test statistics.

Usage

1
2
gseg2_discrete(n, E, id, statistics=c("all","o","w","g","m"), l0=0.05*n, l1=0.95*n, 
   pval.appr=TRUE, skew.corr=TRUE, pval.perm=FALSE, B=100)

Arguments

n

The number of observations in the sequence.

E

The edge matrix (a "number of edges" by 2 matrix) for the similarity graph. Each row contains the node indices of an edge.

id

The index of observations (order of observations).

statistics

The scan statistic to be computed. A character indicating the type of of scan statistic desired. The default is "all".

"all": specifies to compute all of the scan statistics: original, weighted, generalized, and max-type;

"o", "ori" or "original": specifies the original edge-count scan statistic;

"w" or "weighted": specifies the weighted edge-count scan statistic;

"g" or "generalized": specifies the generalized edge-count scan statistic; and

"m" or "max": specifies the max-type edge-count scan statistic.

l0

The minimum length of the interval to be considered as a changed interval.

l1

The maximum length of the interval to be considered as a changed interval.

pval.appr

If it is TRUE, the function outputs p-value approximation based on asymptotic properties.

skew.corr

This argument is useful only when pval.appr=TRUE. If skew.corr is TRUE, the p-value approximation would incorporate skewness correction.

pval.perm

If it is TRUE, the function outputs p-value from doing B permutations, where B is another argument that you can specify. Doing permutation could be time consuming, so use this argument with caution as it may take a long time to finish the permutation.

B

This argument is useful only when pval.perm=TRUE. The default value for B is 100.

Value

Returns a list scanZ with tauhat, Zmax, and a vector of the scan statistics for each type of scan statistic specified. See below for more details.

tauhat_a

An estimate of the two ends of the changed interval for averaging approach.

tauhat_u

An estimate of the two ends of the changed interval for union approach.

Z_a_max

The test statistic (maximum of the scan statistics) for averaging approach.

Z_u_max

The test statistic (maximum of the scan statistics) for union approach.

Zo_a

A matrix of the original scan statistics (standardized counts) for averaging approach if statistic specified is "all" or "o".

Zo_u

A matrix of the original scan statistics (standardized counts) for union approach if statistic specified is "all" or "o".

Zw_a

A matrix of the weighted scan statistics (standardized counts) for averaging approach if statistic specified is "all" or "w".

Zw_u

A matrix of the weighted scan statistics (standardized counts) for union approach if statistic specified is "all" or "w".

S_a

A matrix of the generalized scan statistics (standardized counts) for averaging approach if statistic specified is "all" or "g".

S_u

A matrix of the generalized scan statistics (standardized counts) for union approach if statistic specified is "all" or "g".

M_a

A matrix of the max-type scan statistics (standardized counts) for averaging approach if statistic specified is "all" or "m".

M_u

A matrix of the max-type scan statistics (standardized counts) for union approach if statistic specified is "all" or "m".

Ro_a

A matrix of raw counts of the original scan statistic for averaging approach. This output only exists if the statistic specified is "all" or "o".

Ro_u

A matrix of raw counts of the original scan statistic for union approach. This output only exists if the statistic specified is "all" or "o".

Rw_a

A matrix of raw counts of the weighted scan statistic for averaging approach. This output only exists if statistic specified is "all" or "w".

Rw_a

A matrix of raw counts of the weighted scan statistic for union approach. This output only exists if statistic specified is "all" or "w".

pval.appr

The approximated p-value based on asymptotic theory for each type of statistic specified.

pval.perm

This output exists only when the argument pval.perm is TRUE . It is the permutation p-value from B permutations and appears for each type of statistic specified (same for perm.curve, perm.maxZs, and perm.Z).

perm.curve

A B by 2 matrix with the first column being critical values corresponding to the p-values in the second column.

perm.maxZs

A sorted vector recording the test statistics in the B permutaitons.

perm.Z

A B by n-squared matrix with each row being the vectorized scan statistics from each permutaiton run.

See Also

gSeg, gseg2, nnl, gseg1_discrete

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
d = 50
mu = 2
tau = 100
n = 200

set.seed(500)
y1_temp = matrix(rnorm(d*tau),tau)
sam1 = sample(1:tau, replace = TRUE)
y1 = y1_temp[sam1,] 
y2_temp = matrix(rnorm(d*(n-tau),mu/sqrt(d)), n-tau)
sam2 = sample(1:tau, replace = TRUE)
y2 = y2_temp[sam2,] 

y = rbind(y1, y2)

# This data y has repeated observations
y_uni = unique(y)
E = nnl(dist(y_uni), 1)

cha = do.call(paste, as.data.frame(y))    
id = match(cha, unique(cha))

r1 = gseg2_discrete(n, E, id, statistics="all")

gSeg documentation built on Oct. 23, 2020, 5:54 p.m.