svs-package: Tools for Semantic Vector Spaces

svs-packageR Documentation

Tools for Semantic Vector Spaces

Description

This package offers various tools for semantic vector spaces. There are techniques for correspondence analysis (simple, multiple and discriminant), latent semantic analysis, probabilistic latent semantic analysis, non-negative matrix factorization, latent class analysis, EM clustering, logratio analysis and log-multiplicative (association) analysis. Furthermore, the package has specialized distance measures and plotting functions as well as some helper functions.

Contents

This package contains the following raw data files (in the folder extdata):

SndT_Fra.txt

Seventeen Dutch source words and their French translations.

SndT_Eng.txt

Seventeen Dutch source words and their English translations.

InvT_Fra.txt

Seventeen Dutch target words and their French source words.

InvT_Eng.txt

Seventeen Dutch target words and their English source words.

Ctxt_Dut.txt

Context words for seventeen Dutch words.

Ctxt_Fra.txt

Context words for seventeen Dutch words translated from French.

Ctxt_Eng.txt

Context words for seventeen Dutch words translated from English.

The (fast procedures for the) techniques in this package are:

fast_sca

Simple correspondence analysis.

fast_mca

Multiple correspondence analysis.

fast_dca

Discriminant correspondence analysis.

fast_lsa

Latent semantic analysis.

fast_psa

Probabilistic latent semantic analysis.

fast_nmf

Non-negative matrix factorization.

fast_lca

Latent class analysis.

fast_E_M

EM clustering.

fast_lra

Logratio analysis.

fast_lma

Log-multiplicative (association) analysis.

The complete overview of local and global weighting functions in this package can be found on weighting_functions.

The specialized distance measures are:

dist_chisquare

Chi-square distance.

dist_cosine

Cosine distance.

dist_wrt

Distance with respect to a certain point.

dist_wrt_centers

Distance with respect to cluster centers.

The specialized plotting functions are:

cd_plot

Cumulative distribution plot.

pc_plot

Parallel coordinate plot.

There are two helper functions for correspondence analysis:

freq_ca

Compute level frequencies (for a factor).

centers_ca

Compute coordinates for cluster centers.

There is one helper function for pvclust:

complete_pvpick

Complete the output of pvpick.

There is one helper function for igraph:

layout4bipartite

Create a layout matrix for a bipartite graph.

The remaining helper functions in this package are:

rep4dat

Repeat the rows of a data frame according to a frequency column.

vec2ddc

Transform a vector into a double-coded matrix.

dat2ddc

Transform a data frame into a double-coded matrix.

vec2ind

Transform a vector into an indicator matrix.

tab2dat

Transform a table into a data frame.

tab2ind

Transform a table into an indicator matrix.

dat2ind

Transform a data frame into an indicator matrix.

outerec

Recursive application of the outer product.

pmi

Pointwise mutual information.

MI

Mutual information.

log_or_0

Logarithmic transform.

Further reference

  • Many packages contain correspondence analysis: ca, FactoMineR, MASS and others.

  • For latent semantic analysis there is also the package lsa.

  • The package NMF provides more flexibility for non-negative matrix factorization.

  • For topic models there are the packages lda and topicmodels.

  • Latent class analysis can also be run in the package poLCA.

  • For log-ratio analysis there is also the package easyCODA.

  • The package gnm offers much flexibility for association analysis, i.e. fitting log-multiplicative or Goodman's RC models.

Link

As from 2023, this package is part of Module 10: Multivariate data analysis with R of the Summer School Methods in Language Sciences.

Author

Koen Plevoets, koen.plevoets@ugent.be

Acknowledgements

This package has benefited greatly from the helpful comments of Lore Vandevoorde, Pauline De Baets and Gert De Sutter. Thanks to Kurt Hornik, Uwe Ligges and Brian Ripley for their valuable recommendations when proofing this package.


svs documentation built on June 24, 2024, 5:07 p.m.