get_svd | R Documentation |
Random SVD is an efficient approximation of truncated SVD, in which only the first principal components are returned. It is computed with the rsvd package, and the author suggests that the number of dimensions requested k should be: k < n / 4, where n is the number of features, for it to be efficient, and that otherwise one should rather use either SVD or truncated SVD. When computing SVD on PMI, we only want to use the singular values corresponding to the positive eigen values. We do not know beforehand how many we will have to filter out, so there is two parameters: 'embedding_dim' for the requested output dimension, and 'svd_rank' for the actual SVD computation, by default twice the requested dimension, and a warning may be thrown if 'svd_rank' needs to be manually increased. Computation may be expensive and manually optimizing the 'svd_rank' parameter might save significant time.
get_svd(m_pmi, embedding_dim = 100, svd_rank = embedding_dim * 2)
m_pmi |
Pointwise mutual information matrix. |
embedding_dim |
Number of output embedding dimensions requested. |
svd_rank |
Number of SVD dimensions to compute. |
SVD rectangular matrix
df_ehr = data.frame(Patient = c(1, 1, 2, 1, 2, 1, 1, 3, 4),
Month = c(1, 1, 1, 2, 2, 3, 3, 4, 4),
Parent_Code = c('C1', 'C2', 'C2', 'C1', 'C1', 'C1',
'C2', 'C3', 'C4'),
Count = 1:9)
spm_cooc = build_df_cooc(df_ehr)
m_pmi = get_pmi(spm_cooc)
m_svd = get_svd(m_pmi, embedding_dim = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.