High Performance Text Analysis

ACMI_contributionCalculate Average Conditional Mutual Information (ACMI)...
calculate_document_pair_distancesDocument Distances
check_directory_nameA function to ensure that a directory name is in the proper...
clean_document_textA function which cleans the raw text of a document provided...
color_words_by_frequencyA function to generate LaTeX output from a dataframe...
color_word_tableA function to generate LaTeX output from a dataframe...
combine_document_term_matricesA function to combine multiple document term matrices into a...
compare_tf_idf_scalingsA function that performs a bunch of different forms of TF-IDF...
congress_billsAll versions of the first 20 bills introduced in the House...
contingency_tableGenerates a contingency table from user-specified document...
convert_quanteda_to_slamA function to convert a quanteda dfm object to a...
corenlpRuns Stanford CoreNLP on a collection of documents
corenlp_blockedRuns Stanford CoreNLP on a collection of .txt files and...
count_ngramsAn experimental function to efficiently generate a vocabulary...
count_wordsA function to efficiently form aggregate word counts and a...
dice_coefficient_diff_tableLines In Both Documents via Dice Coefficients
dice_coefficient_line_matchingLines In Both Documents via Dice Coefficients
distinct_wordsA function to find (semi)-distinct words in a list of term...
document_similaritiesCalculate sequence based document similarities
document_term_count_listDocument Term Count List: Conressional Bills
document_term_vector_listDocument Term Vector List: Conressional Bills
download_corenlpChecks the java version on your computer and downloads...
download_malletChecks the java version on your computer and downloads MALLET...
edit_metricsCalculate Edit Metrics Between Two Document Versions
estimate_plotsA function to parameter estimate plots with 95 percent...
feature_selectionA function that implements a number of feature selection...
fightin_words_plotA function that generates plots similar to those in Monroe et...
frequency_thresholdA function to frequency threshold a vector of strings.
generate_blocked_document_term_vectorsA function to generate and save blocks of document term...
generate_document_term_matrixA function to generate a document term matrix from a list of...
generate_document_term_vectorsA function to generate document term vectors from a variety...
generate_sparse_large_document_term_matrixA function to generate a sparse large document term matrix in...
get_file_pathsA function the returns the file paths to two example raw...
get_unique_values_and_countsFind unique values and the counts of those variables for a...
kill_zombiesA function which takes no arguments and kills zombie R...
mallet_ldaA wrapper function for LDA using the MALLET machine learning...
multi_dice_coefficient_matchingMultiple N-Gram Lngth Dice Coefficient Document Matching
multi_plotAn implementation of matplot with nice coloring and automatic...
mutual_informationMutual Information
ngramsExtracts N-Grams and phrases from a collection od documents...
ngram_sequence_matchingN-Gram Sequence Matching
ngram_sequnce_plotN-Gram Sequence Matching
order_by_countsA function to generate an ordered word count dataframe from a...
pmiA function to calculate a number of information-theoretic...
Processed_TextTwenty bills tokenized and tagged by CORENLP
reference_distribution_distanceReference distribtuion distances
sparse_doc_term_parallelOnly to be used internally. A function to generate a sparse...
sparse_to_dense_matrixA function to convert a slam::simple_triplet_matrix sparse...
SpeedReaderSpeedReader: functions to facilitate high performance text...
speed_set_vocabularyA function the reorgaizes vocabulary to speed up document...
tfidfA function to calculate TF-IDF and other related statistics...
topic_coherenceA function to calculate topic coherence for a given topic...
unlist_and_concatenateA function to unlist and concatenate a subset of a...
