get_svd: Compute random singular value decomposition (rSVD)

View source: R/pmi_svd.R

get_svdR Documentation

Compute random singular value decomposition (rSVD)

Description

Random SVD is an efficient approximation of truncated SVD, in which only the first principal components are returned. It is computed with the rsvd package, and the author suggests that the number of dimensions requested k should be: k < n / 4, where n is the number of features, for it to be efficient, and that otherwise one should rather use either SVD or truncated SVD. When computing SVD on PMI, we only want to use the singular values corresponding to the positive eigen values. We do not know beforehand how many we will have to filter out, so there is two parameters: 'embedding_dim' for the requested output dimension, and 'svd_rank' for the actual SVD computation, by default twice the requested dimension, and a warning may be thrown if 'svd_rank' needs to be manually increased. Computation may be expensive and manually optimizing the 'svd_rank' parameter might save significant time.

Usage

get_svd(m_pmi, embedding_dim = 100, svd_rank = embedding_dim * 2)

Arguments

m_pmi

Pointwise mutual information matrix.

embedding_dim

Number of output embedding dimensions requested.

svd_rank

Number of SVD dimensions to compute.

Value

SVD rectangular matrix

Examples

df_ehr = data.frame(Patient = c(1, 1, 2, 1, 2, 1, 1, 3, 4),
                    Month = c(1, 1, 1, 2, 2, 3, 3, 4, 4),
                    Parent_Code = c('C1', 'C2', 'C2', 'C1', 'C1', 'C1',
                                    'C2', 'C3', 'C4'),
                    Count = 1:9)

spm_cooc = build_df_cooc(df_ehr)

m_pmi = get_pmi(spm_cooc)
m_svd = get_svd(m_pmi, embedding_dim = 2)


nlpembeds documentation built on April 4, 2025, 4:41 a.m.