dissrf: Relative Frequency Groups.

View source: R/dissrf.R

dissrfR Documentation

Relative Frequency Groups.

Description

Relative Frequency (RF) groups are equally sized groups obtained by partitioning sorted cases into k consecutive groups. Function dissrf returns the medoid indexes of the RF groups and related statistics. Function seqrf is for sequence data and returns in addition the RF medoid sequences.

Usage

dissrf(diss,
       k=NULL,
       sortv="mds",
       weights=NULL,
       grp.meth = "prop",
       squared = FALSE,
       pow = NULL)

seqrf(seqdata,
       diss,
       k=NULL,
       sortv="mds",
       weights=NULL,
       weighted=TRUE,
       grp.meth = "prop",
       squared = FALSE,
       pow = NULL)

## S3 method for class 'dissrf'
summary(object, dist.idx = 1:10, ...)

## S3 method for class 'seqrf'
summary(object, format="SPS", dist.idx = 1:10, ...)

Arguments

diss

Matrix or distance object. Pairwise dissimilarities between analyzed cases.

seqdata

State sequence stslist object as produced by seqdef.

k

Integer: Number of groupings (RF groups). When NULL, k is set as the minimum between 100 and the sum of weights over 10.

sortv

Real vector (of length nrow(diss)), character string, or NULL. Sorting variable used to compute the frequency groups. If NULL, the original data order is used. If "mds" (default), the first MDS factor of diss (diss^2 when squared=TRUE) is used. Ties are randomly ordered. For seqrf only, can also be one of "from.start" and "from.end".

weights

Vector (of length nrow(diss)) of non-negative weights. If NULL (default), equal weights except when weighted is set as TRUE in seqrf.

weighted

Logical. Should weights be used when there are weights in seqdata? (default is TRUE)

grp.meth

Character string. One of "prop", "first", and "random". Grouping method. See details.

squared

Logical. Should medoids (and computation of sortv when applicable) be based on squared dissimilarities? (default is FALSE)

pow

Double. Dissimilarity power exponent (typically 1 or 2) for computation of pseudo R2 and F. When NULL, pow is set as 1 when squared = FALSE, and as 2 otherwise.

...

further arguments passed to or from other methods such as print.stslist

object

Object of class dissrf or seqrf

format

String. One of "SPS" (default) or "STS". Display format of the medoid sequences.

dist.idx

Indexes of RF groups for which summary statistics of distances to the medoids are displayed. Default is 1:10. Set as 0 to plot statistics for all RF groups.

Details

Function dissrf partitions the n cases (rows of the diss matrix) into k equally sized groups (RF groups). First, the cases are sorted according to the sortv variable. Then the groups are built by consecutively grouping the first n/k cases, then the next n/k cases, and so on. In seqrf, one of sort methods "from.start" and "from.end" can be specified as sortv argument.

Ties in the sortv variable are handled by order using the default method, which produces stable outcome. To use a different method, compute a suited variable without ties (e.g. using order with the wanted method for ties) and pass it as sortv argument.

The grp.meth argument applies when the group size (n/k) is not integer. With grp.meth="first", the integer part of n/k is used as basic group size and the size of the first groups is augmented by one unit so that the sum of the group sizes equals n. With grp.meth="random", randomly selected groups have their size augmented by one unit, and with grp.meth="prop" (default), cases at the limit between groups are proportionally assigned to each of the two groups.

For seqrf, when weights=NULL and weighted=TRUE, weights is set as the weights attribute of seqdata.

When weights is non-null (dissrf) or when wheighted=TRUE and there are weights in seqdata (seqrf), only grp.meth="prop" applies.

The function computes indicative statistics of the resulting partition, namely a pseudo R2 and a pseudo F statistics. These statistics compare the mean distance to the group medoid with the mean distance to the overall medoid. When pow is 2, mean squared dissimilarities are used and when pow is 1 the R2 and F ratios are based on mean of non-squared dissimilarities. An indicative p-value of the F statistics is computed using the F distribution. This p-value should be interpreted with caution since F is not a true F value.

Value

dissrf returns a list of class dissrfprop when grp.meth="prop" and of class dissrfcrisp otherwise. In both cases the list also receives class "dissrf". The elements of the list are:

medoids

index of the group medoids

med.names

names (diss colnames) of the group medoids

wg

working matrix used by the "prop" procedure (class dissrfprop only)

dist.list

list with for each successive group the distances from its elements to the group medoid

index.list

list with for each successive group the index of its elements

weights.list

list with for each successive group the weights of its elements in the group

heights

relative group size, which may be different when grp.meth is "first" or "random"

kmedoid.index

vector with for each case the index of its group medoid (class dissrfcrisp only)

kmedoid.dist

vector with for each case the distance to its group medoid (class dissrfcrisp only)

mdsk

vector of group membership (class dissrfcrisp only)

at

positions for the boxplots of distances to group medoids

R2

Pseudo R2: Mean distance to the group medoids over mean distance to the overall medoid

Fstat

Pseudo F statistics

pvalue

p-value of the pseudo F (to be used with caution since F is not a true F value)

sizes

ncase (number of cases), wsum (sum of weights), k (number of groups), gsize (group size)

grp.meth

grouping method used

seqrf returns a list of class seqrfprop when grp.meth="prop" and of class seqrfcrisp otherwise. In both cases the list also receives class "seqrf". The elements of the list are:

seqtoplot

RF medoid sequences as a state sequence stslist object

rf

the associated dissrf object

There are print and summary methods for objects of class dissrf and seqrf, and a plot method for objects of class seqrf

Author(s)

Gilbert Ritschard.

References

Fasang, Anette Eva and Tim F. Liao. 2014. "Visualizing Sequences in the Social Sciences: Relative Frequency Sequence Plots." Sociological Methods & Research 43(4):643-676.

See Also

plot.seqrf, seqrfplot, dissrep, and seqrep

Examples

## Defining a sequence object with the data in columns 10 to 25
## (family status from age 15 to 30) in the biofam data set
data(biofam)
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
"Child", "Left+Child", "Left+Marr+Child", "Divorced")

## Here, we use only 100 cases selected such that all elements
## of the alphabet be present.
## (More cases and a larger k would be necessary to get a meaningful example.)
biofam.seq <- seqdef(biofam[501:600, 10:25], labels=biofam.lab,
                    weights=biofam[501:600,"wp00tbgs"])
diss <- seqdist(biofam.seq, method="LCS")

## Using 12 groups, default MDS sorting,
##  and original method by Fasang and Liao (2014)
dissrf(diss=diss, k=12, grp.meth="first")

## Using 12 groups, weights, default MDS sorting,
##  and default "prop" method
w <- attr(biofam.seq, "weights")
dissrf(diss=diss, k=12, weights=w)

## With a user specified sorting variable
## Here time spent in parental home, which has ties
parentTime <- seqistatd(biofam.seq)[, 1]
b.srf <- seqrf(biofam.seq, diss=diss, k=12, sortv=parentTime)

## print, summary, and plot methods
b.srf
summary(b.srf)
plot(b.srf)
plot(b.srf, which.plot="both")


TraMineR documentation built on May 29, 2024, 5 a.m.