dsm_canonical_matrix: Canonical Formats for a DSM Co-occurrence Matrix (wordspace)

dsm.canonical.matrixR Documentation

Canonical Formats for a DSM Co-occurrence Matrix (wordspace)

Description

Test whether a co-occurrence matrix is represented in a DSM canonical format, or convert matrix to canonical format.

Usage


dsm.is.canonical(x, nonneg.check = FALSE)

dsm.canonical.matrix(x, triplet = FALSE, annotate = FALSE, nonneg.check = FALSE)

Arguments

x

a dense or sparse DSM co-occurrence matrix

nonneg.check

if TRUE, check whether all elements of the matrix are non-negative

triplet

if TRUE and if x is sparse, return a matrix in triplet format (class dgTMatrix) rather than in column-compressed format (class dgCMatrix). Note that this is not a canonical DSM format.

annotate

if TRUE, annotate x with attributes sparse and nonneg, indicating whether the matrix is in sparse representation and non-negative, respectively. Non-negativity is only checked if nonneg.check=TRUE; otherwise an existing attribute will be passed through without validation.

Details

Note that conversion into canonical format may result in unnecessary copying of x, especially if annotate=TRUE. For optimal performance, set annotate=FALSE whenever possible and do not call dsm.canonical.matrix() as a no-op.

Instead of

    M <- dsm.canonical.matrix(M, annotate=TRUE, nonneg=TRUE)

use

    M.flags <- dsm.is.canonical(M, nonneg=FALSE)
    if (!M.flags$canonical) M <- dsm.canonical.matrix(M)
    M.flags <- dsm.is.canonical(M, nonneg=TRUE)

If nonneg.check=FALSE and x has an attribute nonneg, its value is accepted without validation.

Checking non-negativity can be expensive and create substantial memory overhead. It is guaranteed to be efficient for a matrix in canonical format.

Value

dsm.is.canonical() returns a data frame containing a single row with the following items:

sparse

whether x is a sparse (TRUE) or dense (TRUE) matrix

canonical

whether x is in canonical format

nonneg

whether all cells of x are non-negative; may be NA if nonneg.check=FALSE

dsm.canonical.matrix() returns a matrix in canonical DSM format, i.e.

  • of class matrix for a dense matrix (even if x is a denseMatrix object);

  • of class dgCMatrix for a sparse matrix.

If triplet=TRUE and x is sparse, it returns a matrix of class dgTMatrix, which is not a canonical format.

If annotate=TRUE, the returned matrix has attributes sparse and nonneg (possibly NA).

Author(s)

Stephanie Evert (https://purl.org/stephanie.evert)


wordspace documentation built on Aug. 23, 2022, 1:06 a.m.