qlcMatrix-package: Utility sparse matrix functions for Quantitative Language...
In qlcMatrix: Utility Sparse Matrix Functions for Quantitative Language Comparison

qlcMatrix-package

R Documentation

Utility sparse matrix functions for Quantitative Language Comparison (QLC)

Description

This package contains various functions that extend the functionality of the Matrix package for using sparse matrices. Some of the functions are very general, while other are highly specific for special data format as used for quantitative language comparison.

Details

Package:	qlcMatrix
Type:	Package
Version:	0.9.8
Date:	2024-05-06
License:	GPL-3

This package contains various different kinds of function.

First, some general utility functions to deal with sparse matrices: (i) rowMax to compute and identify row-wise maxima and minima in sparse matrices, (ii) rKhatriRao to remove empty rows in a KhatriRao product (but still get the right rownames) and (iii) rSparseMatrix to produce random sparse matrices. There are also some experimental basic methods for handling sparse arrays ("tensors"), most interestingly unfold.

Second, some general functions to compute associations between the columns of sparse matrices, with possibilities for extension for ad-hoc measures: cosSparse, corSparse, and assocSparse There are special versions of these for nominal data cosNominal, assocNominal.

Third, there are three central functions needed to efficiently turn data from quantitative language comparison into sparse matrices. These basic functions are then used by high-level function in this package. Although these functions might seem almost trivial, they form the basis for many highly complex computations. They are ttMatrix, pwMatrix and jMatrix.

Fourth, there are some high-level convenience function that take specific data formats from quantitative language comparison and turn them into set of sparse matrices for efficient computations. They might also be useful for other data types, but various details decisions are specifically tailored to the envisioned data types. These functions are splitTable splitStrings, splitWordlist, and splitText.

Finally, there are various shortcuts to directly compute similarity matrices from various kinds of data: sim.nominal, sim.words, sim.strings, sim.wordlist. These are specifically tailored towards specific kinds of data, though they might also be useful elsewhere. Also, the code is mostly easy wrappers around the split and cos/assoc functions, so it should not be difficult to adapt these functions to other needs.

Author(s)

Michael Cysouw <cysouw@mac.com>

References

Cysouw, Michael. 2014. Matrix Algebra for Language Comparison. Manuscript.

Mayer, Thomas and Michael Cysouw. 2012. Language comparison through sparse multilingual word alignment. Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH, 54–62. Avignon: Association for Computational Linguistics.

Prokić, Jelena and Michael Cysouw. 2013. Combining regular sound correspondences and geographic spread. Language Dynamics and Change 3(2). 147–168.

qlcMatrix documentation built on June 25, 2024, 1:16 a.m.