dependence.structure: determines the dependence structure
In multivariance: Measuring Multivariate Dependence Using Distance Multivariance

Description Usage Arguments Details Value References Examples

View source: R/multivariance-functions.R

Determines the dependence structure as described in [3].

dependence.structure(
  x,
  vec = 1:ncol(x),
  verbose = TRUE,
  detection.aim = NULL,
  type = "conservative",
  structure.type = "clustered",
  c.factor = 2,
  list.cdm = NULL,
  alpha = 0.05,
  p.adjust.method = "holm",
  stop.too.many = NULL,
  ...
)

`x`	matrix, each row of the matrix is treated as one sample
`vec`	vector, it indicates which columns are initially treated together as one sample
`verbose`	boolean, if `TRUE` details are printed during the detection and whenever a cluster is newly detected the (so far) detected dependence structure is plotted.
`detection.aim`	`=NULL` or a list of vectors which indicate the expected detection, see below for more details
`type`	the method used for the detection, one of '`conservative`','`resample`','`pearson_approx`' or '`consistent`'
`structure.type`	either the '`clustered`' or the '`full`' structure is detected
`c.factor`	numeric, larger than 0, a constant factor used in the case of '`type = "consistent"`'
`list.cdm`	not required, the list of doubly centered distance matrices corresponding to `x` speeds up the computation if given
`alpha`	numeric between 0 and 1, the significance level used for the tests
`p.adjust.method`	a string indicating the p-value adjustment for multiple testing, see `p.adjust.methods`
`stop.too.many`	numeric, upper limit for the number of tested tuples. A warning is issued if it is used. Use `stop.too.many = NULL` for no limit.
`...`	these are passed to `find.cluster`

Performs the detection of the dependence structure as described in [3]. In the clustered structure variables are clustered and treated as one variable as soon as a dependence is detected, the full structure treats always each variable separately. The detection is either based on tests with significance level alpha or a consistent estimator is used. The latter yields (in the limit for increasing sample size) under very mild conditions always the correct dependence structure (but the convergence might be very slow).

If fixed.rejection.level is not provided, the significance level alpha is used to determine which multivariances are significant using the distribution-free rejection level. As default the Holm method is used for p-value correction corresponding to multiple testing.

The resulting graph can be simplified (pairwise dependence can be represented by edges instead of vertices) using clean.graph.

Advanced: The argument detection.aim is currently only implemented for structure.type = clustered. It can be used to check, if an expected dependence structure was detected. This might be useful for simulation studies to determine the empirical power of the detection algorithm. Hereto detection.aim is set to a list of vectors which indicate the expected detected dependence structures (one for each run of find.cluster). The vector has as first element the k for which k-tuples are detected (for this aim the detection stops without success if no k-tuple is found), and the other elements, indicate to which clusters all present vertices belong after the detection, e.g. c(3,2,2,1,2,1,1,2,1) expects that 3-tuples are detected and in the graph are 8 vertices (including those representing the detected 3 dependencies), the order of the 2's and 1's indicate which vertices belong to which cluster. If detection.aim is provided, the vector representing the actual detection is printed, thus one can use the output with copy-paste to fix successively the expected detection aims.

Note that a failed detection might invoke the warning:

1 2	run$mem == detection.aim[[k]][-1] : longer object length is not a multiple of shorter object length

returns a list with elements:

multivariances: calculated multivariances,
cdms: calculated doubly centered distance matrices,
graph: graph representing the dependence structure,
detected: boolean, this is only included if a detection.aim is given,
number.of.dep.tuples: vector, with the number of dependent tuples for each tested order. For the full dependence structure a value of -1 indicates that all tuples of this order are already lower order dependent, a value of -2 indicates that there were more than stop.too.many tuples,
structure.type: either clustered or full,
type: the type of p-value estimation or consistent estimation used,
total.number.of.tests: numeric vector, with the number of tests for each group of tests,
typeI.error.prob: estimated probability of a type I error,
alpha: significance level used if a p-value estimation procedure is used,
c.factor: factor used if a consistent estimation procedure is used,
parameter.range: significance levels (or 'c.factor' values) which yield the same detection result.

For the theoretic background see the reference [3] given on the main help page of this package: multivariance-package.

# structures for the datasets included in the package
dependence.structure(dep_struct_several_26_100)
dependence.structure(dep_struct_star_9_100)
dependence.structure(dep_struct_iterated_13_100)
dependence.structure(dep_struct_ring_15_100)

# basic examples:

x = coins(100) # 3-dependent
dependence.structure(x)

colnames(x) = c("A","B","C")
dependence.structure(x) # names of variables are used as labels

dependence.structure(coins(100),vec = c(1,1,2))
# 3-dependent rv of which the first two rv are used together as one rv, thus 2-dependence.

dependence.structure(x,vec = c(1,1,2)) # names of variables are used as labels


dependence.structure(cbind(coins(200),coins(200,k=5)),verbose = TRUE)
#1,2,3 are 3-dependent, 4,..,9 are 6-dependent

# similar to the the previous example, but
# the pair 1,3 is treated as one sample,
# anagously the pair 2,4. In the resulting structure one does not
# see anymore that the dependence of 1,2,3,4 with the rest is due
# to 4.
dependence.structure(cbind(coins(200),coins(200,k=5)),
                           vec = c(1,2,1,2,3,4,5,6,7),verbose = TRUE)


### Advanced:

# How to check the empirical power of the detection algorithm?
# Use a dataset for which the structure is detected, e.g. dep_struct_several_26_100.
# run:
dependence.structure(dep_struct_several_26_100,
                     detection.aim = list(c(ncol(dep_struct_several_26_100))))
# The output provides the first detection aim. Now we run the same line with the added
# detection aim
dependence.structure(dep_struct_several_26_100,detection.aim = list(c(3,1, 1, 1, 2, 2, 2, 3, 4,
  5, 6, 7, 8, 8, 8, 9, 9, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1, 2, 8, 9),
  c(ncol(dep_struct_several_26_100))))
# and get the next detection aim ... thus we finally obtain all detection aims.
# now we can run the code with new sample data ....
N = 100
dependence.structure(cbind(coins(N,2),tetrahedron(N),coins(N,4),tetrahedron(N),
                           tetrahedron(N),coins(N,3),coins(N,3),rnorm(N)),
                     detection.aim = list(c(3,1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8, 8, 8,
  9, 9, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1, 2, 8, 9),
  c(4,1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11,
    11, 12, 1, 2, 8, 9, 10, 11),
  c(5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 1,
    2, 4, 5, 6, 7, 3),
  c(5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 1,
    2, 4, 5, 6, 7, 3)))$detected
# ... and one could start to store the results and compute the rate of successes.

# ... or one could try to check how many samples are necessary for the detection:
re = numeric(100)
for (i in 2:100) {
  re[i] =
    dependence.structure(dep_struct_several_26_100[1:i,],verbose = FALSE,
                         detection.aim = list(c(3,1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8,
      8, 8, 9, 9, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1, 2, 8, 9),
      c(4,1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 10, 11, 11,
        11, 11, 12, 1, 2, 8, 9, 10, 11),
      c(5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7,
        8, 1, 2, 4, 5, 6, 7, 3),
      c(5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7,
        8, 1, 2, 4, 5, 6, 7, 3)))$detected
  print(paste("First", i,"samples. Detected?", re[i]==1))
}
cat(paste("Given the 1 to k'th row the structure is not detected for k =",which(re == FALSE),"\n"))