Description Usage Arguments Details Value Note Author(s) See Also Examples

Nominal variables can be encoded as a combination of a sparse incidence and index matrix. Various functions to compute variations of `assocSparse`

and `cosSparse`

for such data are described here.

1 2 3 4 5 |

`X,Y` |
sparse matrices in a format of the |

`colGroupX, colGroupY` |
sparse matrices (typically pattern matrices) with the same number of columns as X and Y, respectively, indicating which columns belong to the same group. Each row of these matrices represents a group. |

`rowGroup` |
sparse matrix (typically pattern matrices) with the same number of rows as X (and Y when not NULL), indicating which rows belong to the same group. Each column of these matrices represents a group. |

`norm` |
norm to be used. See |

`weight` |
weighting of rows. See |

`method` |
method to be used. See |

`sparse` |
All methods try to be as sparse as possible. Specifically, when there are no observed co-occurrence, then nothing is computed. This might lead to slight deviations in the results for some methods. Set |

The approaches `assoc`

and `cos`

are described in detail in `assocSparse`

and `cosSparse`

, respectively. Those methods are extended here in case either the columns (`.col`

) or the rows (`.row`

) form groups. Specifically, this occurs with sparse encoding of nominal variables (see `splitTable`

). In such encoding, the different values of a nominal variable are encoded in separate columns. However, these columns cannot be treated independently, but have to be treated as groups.

The `.col`

methods should be used when similarities between the different values of nominal variables are to be computed. The `.row`

methods should be used when similarities between the observations of nominal variables are to be computed.

Note that the calculations of the `assoc`

functions really only makes sense for binary data (i.e. matrices with only ones and zeros). Currently, all input is coerced to such data by `as(X, "nMatrix")*1`

, meaning that all values that are not one or zero are turned into one (including negative values!).

When `Y = NULL`

, then all methods return symmetric similarity matrices in the form `dsCMatrix`

, only specifying the upper triangle. The only exception is when `sparse=T`

is chose, then the result will be in the form `dsyMatrix`

.

When a second matrix Y is specified, the result will be of the kind `dgCMatrix`

or `dgeMatrix`

, respectively.

Note that these methods automatically take missing data into account. They also work with large amount of missing data, but of course the validity of any similarity with much missing data is problematic.

Michael Cysouw

`sim.att, sim.obs`

for convenient shortcuts around these methods.

1 2 3 4 5 6 7 8 9 10 | ```
# convenience functions are easiest to use
# first a simple example using the farms-dataset from MASS
library(MASS)
# to investigate the relation between the individual values
# This is similar to Multiple Correspondence Analysis (see mca in MASS)
f <- splitTable(farms)
s <- assocCol(f$OV,f$AV)
rownames(s) <- f$values
plot(hclust(as.dist(-s)))
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.