SummarizeTopics: Summarize topics in a topic model

View source: R/topic_modeling_utilities.R


Create a data frame summarizing the contents of each topic in a model





A list (or S3 object) with three named matrices: phi, theta, and gamma. These conform to outputs of many of textmineR's native topic modeling functions such as FitLdaModel.


'prevalence' is normalized to sum to 100. If your 'theta' matrix has negative values (as may be the case with an LSA model), a constant is added so that the least prevalent topic has a prevalence of 0.

'coherence' is calculated using CalcProbCoherence.

'label' is assigned using the top label from LabelTopics. This requires an "assignment" matrix. This matrix is like a "theta" matrix except that it is binary. A topic is "in" a document or it is not. The assignment is made by comparing each value of theta to the minimum of the largest value for each row of theta (each document). This ensures that each document has at least one topic assigned to it.


An object of class data.frame or tibble with 6 columns: 'topic' is the name of the topic, 'prevalence' is the rough prevalence of the topic in all documents across the corpus, 'coherence' is the probabilistic coherence of the topic, 'top_terms_phi' are the top 5 terms for each topic according to P(word|topic), 'top_terms_gamma' are the top 5 terms for each topic according to P(topic|word).


Example output

     topic       label_1 prevalence coherence
t_1    t_1        health       2.81     0.054
t_2    t_2         cells       3.34     0.413
t_3    t_3      diabetes       2.72     0.167
t_4    t_4         cmybp       2.78     0.198
t_5    t_5           phd       3.30     0.154
t_6    t_6     infection       2.37     0.264
t_7    t_7          risk       3.60     0.247
t_8    t_8 mitochondrial       3.09     0.262
t_9    t_9            ma       3.29     0.165
t_10  t_10      research       4.53     0.091
t_11  t_11          cell       3.68     0.059
t_12  t_12         tumor       3.80     0.216
t_13  t_13           dna       4.20     0.176
t_14  t_14       imaging       3.75     0.112
t_15  t_15         cells       3.67     0.357
t_16  t_16     influenza       3.30     0.201
t_17  t_17  intervention       3.12     0.243
t_18  t_18          mast       2.05     0.486
t_19  t_19     treatment       3.65     0.153
t_20  t_20         sleep       3.27     0.377
t_21  t_21    microbiome       2.17     0.388
t_22  t_22            dr       3.73     0.032
t_23  t_23      research       3.20     0.044
t_24  t_24           ipf       3.08     0.240
t_25  t_25           rna       4.43     0.054
t_26  t_26          core       3.93     0.122
t_27  t_27      research       4.05     0.168
t_28  t_28  inflammation       3.01     0.085
t_29  t_29     difficile       3.78     0.049
t_30  t_30       develop       2.30     0.333
t_1                    health, data, women, studies, swan
t_2                  ptc, brain, metastatic, brafv, cells
t_3   diabetes, influenza, numeracy, vaccine, centralized
t_4                injury, cmybp, cdk, function, fragment
t_5                  phd, hif, epithelial, model, project
t_6                muscle, sand, fly, infection, strength
t_7                      risk, factors, sud, early, study
t_8    mitochondrial, metabolic, redox, tissue, radiation
t_9                       ma, activity, aim, cortex, mice
t_10      research, program, cancer, students, prevention
t_11                   cells, cell, specific, lung, brain
t_12             cancer, dcis, pancreatic, tumor, genetic
t_13           dna, rna, transcription, repair, structure
t_14             imaging, clinical, cancer, develop, time
t_15       cells, carbon, metabolism, intracellular, cell
t_16                response, hiv, env, antibodies, human
t_17 intervention, fertility, health, behavior, community
t_18                            mast, cell, cells, fc, ri
t_19    treatment, methods, evaluation, clinical, develop
t_20        sleep, plasticity, synaptic, deficits, memory
t_21         microbiome, gut, crc, psoriasis, composition
t_22               dr, administrative, ucdc, research, te
t_23              health, research, hiv, disease, testing
t_24                    ipf, lung, cns, expression, based
t_25      structural, activity, natural, including, nmdar
t_26               core, center, projects, data, research
t_27       research, core, center, investigators, support
t_28          inflammation, hiv, study, battery, capacity
t_29         genetic, difficile, extinction, pd, approach
t_30            wall, large, stiffening, effects, disease
t_1                             lepi, worker, mepi, biomechanical, mt
t_2        reprograms, vegf, sorafenib, chemotherapy, micrometastatic
t_3             immunologic, chlamydial, immunized, alaska, curricula
t_4  cleavage, cardiomyocytes, stabilizes, occlusion, hyperactivation
t_5                         xenotransplantation, iv, heparin, pig, hs
t_6                      parasitic, sarcopenia, vector, west, elderly
t_7              kendler, heavy, trajectories, nesarc, neurocognition
t_8                         couples, ratios, rt, adipocyte, reflected
t_9                     prefrontal, concerns, madr, impairments, arch
t_10           accepted, undergraduate, journals, actively, sponsored
t_11                  allergen, reversible, ccr, asthma, multiphtoton
t_12                        taste, glycomic, origin, glycoform, shows
t_13                           genomes, tefb, elongation, pairing, ac
t_14             false, nanoparticles, partial, nanosensors, emission
t_15               virulence, shigella, adherence, plaques, cytoplasm
t_16                                   mabs, vlbw, birth, enteric, ab
t_17                births, hospitalized, youth, adjunctive, military
t_18           truncation, interpreted, tubulin, resulted, attenuated
t_19         rules, constructing, challenging, surveillance, accuracy
t_20               eeg, impairment, psychiatric, spindle, eszopiclone
t_21                       psoriatic, baseline, lifestyle, fecal, nas
t_22                                   fo, teprorm, stnar, sc, vnrrps
t_23                                abm, hsieh, grocery, hopkins, hub
t_24                           encode, overarching, mrna, srt, mirnas
t_25                      substituent, plms, nrs, antifungal, lactone
t_26                           nsls, bnl, computing, instruments, ray
t_27                     lipidomics, invertebrate, mdibl, ctsa, cobre
t_28                    recharge, gadgets, myocyte, practically, mybp
t_29                   seeking, vorinostat, ido, allogeneic, exciting
t_30              cyclic, temporally, perivascular, doxycycline, tone

