disstree: Dissimilarity Tree

View source: R/disstree.R

disstreeR Documentation

Dissimilarity Tree

Description

Tree structured discrepancy analysis of objects described by their pairwise dissimilarities.

Usage

disstree(formula, data = NULL, weights = NULL, min.size = 0.05,
  max.depth = 5, R = 1000, pval = 0.01, object = NULL,
  weight.permutation = "replicate", squared = FALSE, first = NULL,
  minSize, maxdepth)

Arguments

formula

Formula with a dissimilarity matrix as left hand side and the candidate partitioning variables on the right side.

data

Data frame where variables in formula will be searched for.

weights

Optional numerical vector of weights.

min.size

Minimum number of cases in a node, will be treated as a proportion if less than 1.

max.depth

Maximum depth of the tree

R

Number of permutations used to assess the significance of the split.

pval

Maximum allowed p-value for a split

object

An optional R object represented by the dissimilarity matrix. This object may be used by the print method or disstree2dot to render specific object type.

weight.permutation

Weight permutation method: "diss" (attach weights to the dissimilarity matrix), "replicate" (replicate cases using weights), "rounded-replicate" (replicate case using rounded weights), "random-sampling" (random assignment of covariate profiles to the objects using distributions defined by the weights.)

squared

Logical: Should the diss dissimilarities be squared?

first

One of the variable in the right-hand side of the formula. This forces the first node of the tree to be split by this variable.

minSize

Deprecated. Use min.size instead.

maxdepth

Deprecated. Use max.depth instead.

Details

The procedure iteratively splits the data. At each step, the procedure selects the variable and split that explain the greatest part of the discrepancy, i.e., the split for which we get the highest pseudo R2. The significance of the retained split is assessed through a permutation test.

seqtree provides a simpler interface if you plan to use disstree for state sequence objects.

Value

An object of class disstree that contains the following components:

root

A node object, root of the tree

info

General information such as parameters used to build the tree

info$adjustment

A dissassoc object providing global statistics for tree.

formula

The formula used to generate the tree

data

data used to build the tree

weights

weights

Author(s)

Matthias Studer (with Gilbert Ritschard for the help page)

References

Studer, M., G. Ritschard, A. Gabadinho and N. S. Müller (2011). Discrepancy analysis of state sequences, Sociological Methods and Research, Vol. 40(3), 471-510, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1177/0049124111415372")}.

Studer, M., G. Ritschard, A. Gabadinho and N. S. Müller (2010) Discrepancy analysis of complex objects using dissimilarities. In F. Guillet, G. Ritschard, D. A. Zighed and H. Briand (Eds.), Advances in Knowledge Discovery and Management, Studies in Computational Intelligence, Volume 292, pp. 3-19. Berlin: Springer.

Studer, M., G. Ritschard, A. Gabadinho and N. S. Müller (2009) Analyse de dissimilarités par arbre d'induction. In EGC 2009, Revue des Nouvelles Technologies de l'Information, Vol. E-15, pp. 7-18.

Anderson, M. J. (2001) A new method for non-parametric multivariate analysis of variance. Austral Ecology 26, 32-46.

Batagelj, V. (1988) Generalized ward and related clustering problems. In H. Bock (Ed.), Classification and related methods of data analysis, Amsterdam: North-Holland, pp. 67-74.

Piccarreta, R. et F. C. Billari (2007) Clustering work and family trajectories by using a divisive algorithm. Journal of the Royal Statistical Society A 170(4), 1061–1078.

See Also

seqtree to generate a specific disstree objects for analyzing state sequences.
seqtreedisplay to generate graphic representation of seqtree objects when analyzing state sequences.
disstreedisplay is a more general interface to generate such representation for other type of objects.
disstreeleaf to get leaf membership of each case.
disstree.get.rules to get the list of classification rules as R commands.
disstree.assign for the index of the rules applying to provided profiles.
dissvar to compute discrepancy using dissimilarities and for a basic introduction to discrepancy analysis.
dissassoc to test association between objects represented by their dissimilarities and a covariate.
dissmfacw to perform multi-factor analysis of variance from pairwise dissimilarities.
disscenter to compute the distance of each object to its group center from pairwise dissimilarities.

Examples

data(mvad)

## Defining a state sequence object
mvad.seq <- seqdef(mvad[, 17:86])

## Computing dissimilarities (any dissimilarity measure can be used)
mvad.ham <- seqdist(mvad.seq, method="HAM")
## Grow the tree using a low R value for illustration.
## For R=10, pval cannot be lower than 0.1
dt <- disstree(mvad.ham~ male + Grammar + funemp + gcse5eq + fmpr + livboth,
               data=mvad, R = 10, pval = 0.1)
print(dt)


## Will only work if GraphViz is properly installed
## See seqtree for a simpler way to plot a sequence tree.
## Not run: 
disstreedisplay(dt, image.fun = seqdplot, image.data = mvad.seq,
                ## Additional parameters passed to seqdplot
                with.legend = FALSE, xaxis = FALSE, ylab = "", border=NA)

## End(Not run)

## Second method, using a specific function
myplotfunction <- function(individuals, seqs, ...) {
  par(font.sub=2, mar=c(3,0,6,0), mgp=c(0,0,0))
  ## using mds to order sequence in seqIplot
  mds <- suppressMessages(cmdscale(seqdist(seqs[individuals,], method="HAM"),k=1))
  seqIplot(seqs[individuals,], sortv=mds,...)
}

## If image.data is not set, indexes of individuals are sent to image.fun
## Not run: 
disstreedisplay(dt, image.fun = myplotfunction, cex.main = 3,
                ## additional parameters passed to myplotfunction
                seqs = mvad.seq,
                ## additional parameters passed to seqIplot (through myplotfunction)
                with.legend = FALSE, xaxis = FALSE, ylab = "")

## End(Not run)

## Retrieving terminal node membership
term.leaf <- disstreeleaf(dt)
table(term.leaf)

## Retrieving classification rules
rules <- disstree.get.rules(dt)

## Index of rule (terminal leaf) that applies to a specified profile
## covariates are: male, Grammar, funemp, gcse5eq, fmpr, livboth

profile <- data.frame(male="no", Grammar="yes", funemp="no", gcse5eq="yes", fmpr="no", livboth="no")
rules[disstree.assign(rules, profile=profile)]


TraMineR documentation built on May 29, 2024, 5 a.m.