DSM_GoodsMatrix: A Scored Co-occurrence Matrix of Nouns Denoting Goods...

DSM_GoodsMatrixR Documentation

A Scored Co-occurrence Matrix of Nouns Denoting Goods (wordspace)

Description

A pre-scored verb-object co-occurrence matrix for 240 target nouns denoting goods and the 3 feature verbs own, buy and sell. This matrix is useful for illustrating the application and purpose of dimensionality reduction techniques.

Usage


DSM_GoodsMatrix

Format

A numeric matrix with 240 rows corresponding to target nouns denoting goods and 4 columns, corresponding to

own, buy, sell:

association scores for co-occurrences of the nouns with the verbs own, buy and sell

fringe:

an indicator of how close each point is to the “fringe” of the data set (ranging from 0 to 1)

Details

Co-occurrence data are based on verb-object dependency relations in the British National Corpus, obtained from DSM_VerbNounTriples_BNC. Only nouns that co-occur with all three verbs are included in the data set.

The co-occurrence matrix is weighted with non-sparse log-likelihood (simple-ll) and an additional logarithmic transformation (log). Row vectors are not normalized.

The fringeness score in column fringe indicates how close a data point is to the fringe of the data set. Values are distance quantiles based on PCA-whitened Manhattan distance from the centroid. For example, fringe >= .8 characterizes 20% of points that are closest to the fringe. Fringeness is mainly used to select points to be labelled in plots or to take stratified samples from the data set.

Examples


DSM_GoodsMatrix[c("time", "goods", "service"), ]


wordspace documentation built on Aug. 23, 2022, 1:06 a.m.