library(text2map)
Concept Class Analysis (CoCA) is a method for grouping documents based on the schematic similarities in their engagement with multiple semantic directions. This is a generalization of Correlational Class Analysis for survey data. We outline this method in more detail in our Sociological Science paper, "Concept Class Analysis: A Method for Identifying Cultural Schemas in Texts."
After getting familiar with using CMDist()
, the first step to use CoCA is building two or more semantic directions. For example, here are three semantic directions related to socio-economic status. Note that you must load or create word embeddings.
# build juxtaposed pairs for each semantic directions pairs_01 <- data.frame(additions = c("rich", "richer", "affluence", "wealthy"), substracts = c("poor", "poorer", "poverty", "impoverished") ) pairs_02 <- data.frame(additions = c("skilled", "competent", "proficient", "adept"), substracts = c("unskilled", "incompetent", "inproficient", "inept") ) pairs_03 <- data.frame(additions = c("educated", "learned", "trained", "literate"), substracts = c("uneducated", "unlearned", "untrained", "illiterate") ) # get the vectors for each direction sd_01 <- get_direction(pairs_01, my_wv) sd_02 <- get_direction(pairs_02, my_wv) sd_03 <- get_direction(pairs_03, my_wv) # row bind each direction sem_dirs <- rbind(sd_01, sd_02, sd_03) ```` Next, we feed our document-term matrix, word embeddings matrix, and our semantic direction data.frame from above to the `CoCA` function: ```r classes <- CoCA(my_dtm, wv = my_wv, directions = sem_dirs) print(classes)
Finally, using the plot()
function, we can generate simple visualizations of the schematic classes found:
# this is a quick plot. # designate which module to plot with `module = ` plot(classes, module=1)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.