Domain-class: Class '"Domain"'

Description Fields Generics Class Methods Author(s) References See Also Examples

Description

This class encapsulates fields and methods for processing and storing social semantic vector spaces: a given corpus of texts is analysed to derive an Eigenbasis such that their first Eigenvalues explain 80% of the total stretch needed to expand its eigenvectors to the mapping provided by the raw document-term matrix (constructed over the input corpus).

Through this approximation, the input corpus is lifted up to a more semantic representation, thus allowing to investigate the nature of the associative closeness relations of its term vectors, document vectors, or any of their combinations used in representing competence positions and performance locations of a Person or groups.

Fields

name:

The name of the domain (character string)

mode:

Mode of analysis: currently only 'terminology' is supported, though other views are theoretically plausible (such as focusing on the 'incidences' provided as documents or 'both').

textmatrix:

The TermDocumentMatrix holding the raw data from which a mpia space is constructed with spacify.

space:

The Eigensystem: holds the three truncated matrices resulting from the singular value decomposition.

processed:

Logical: denotes whether the space was already calculated from textmatrix.

signature:

A unique identifier of the domain (hash value of the space variable); automatically given by the spacify method.

traces:

Internally used to store temporary fold-in data (of positions).

termProximities:

The symmetric matrix of cosine proximities for all term pairs.

proximityThreshold:

The threshold for associative closness to be considered near.

identityThreshold:

The threshold for associative closness to be considered 'same'.

visualiser:

The visualiser object.

version:

The version number of the Domain-class.

Generics

plot

signature(x = "Domain", y = "ANY"): Visualise the projection surface of the domain as plain or perspective plot.

toponymy

signature(x = "Domain"): Analyse the places in the visualisation and label landmarks accordingly.

summary

signature(x = "Domain"): Print basic descriptive statistics about the data held.

Class Methods

initialize(name, ...):

Constructor; name should preferably be a unique identifier.

calculateTermProximities(mode, normalise, mincomp):

Determine the associative closeness of all term pairs in a given domain, defined as their cosine proximity in the Eigenspace. If normalise is set to TRUE (default is false), the cosine values are normalised according to their frequency distribution (interval-scaled from min to max). The parameter mincomp is per default set to the (ceiling of the) square root of the number of rows of the term-loadings in the mpia space Tk. If any component of the graph has less members than mincomp, its node with the highest betweenness is reattached to its closest node (with the original term proximity value as edge weight). This means that the number of stray isolates and stray isolate components is reduced via loose attachment below the proximity threshold.

spacify():

Determine optimal number of dimensions for the conceptual space and convert the source vectors to a space in its Eigenbasis.

corpus(x):

Create a document-term matrix from corpus x (either a list of files or directory, a Source object, or a TermDocumentMatrix). Store it internally in field textmatrix.

addTrace(vecs):

User interface for adding query document-term matrix vectors using fold-ins: project new texts into an existing Eigenspace.

fold_in(docvecs):

Internally used fold-in routine: returns a context vector appendable to the right singular Eigenvectors (not a document-term matrix vector such as provided by fold_in).

getVocabulary():

Returns the list of terms used in the conceptual vector space.

getName():

Returns the (manually assigned) label of the domain.

getSpace():

Returns the space object.

setSpace(x):

Set the space object (an LSAspace).

submit():

Not yet implemented: hook for remoting via http://cRunch.kmi.open.ac.uk.

print():

Pretty printing of the domain object.

show():

Display the object by printing its key characteristics.

copy(shallow):

Internal routines required for upgrading Domain objects to newer versions of the Domain-class.

Author(s)

Fridolin Wild <fridolin.wild@open.ac.uk>

References

Fridolin Wild (2013): Meaningful Purposive Interaction Analysis.

See Also

lsa, tm, textmatrix

Examples

1
2
3
4
5
	d = Domain(name="test")
	evidence = TermDocumentMatrix( Corpus( VectorSource( c("abc abc def",
        "def def ghi", "ghi ijk, lmno", "pqrs tuv wxyz") ) ) )
	d$corpus(evidence)
	d$spacify()

mpia documentation built on May 2, 2019, 4:18 p.m.