Man pages for mlampros/textTinyR
Text Processing for Small or Big Data Files

batch_computeCompute batches
big_tokenize_transformString tokenization and transformation for big data sets
bytes_converterbytes converter of a text file ( KB, MB or GB )
cluster_frequencyFrequencies of an existing cluster object
cosine_distancecosine distance of two character strings (each string...
COS_TEXTCosine similarity for text documents
Count_RowsNumber of rows of a file
dense_2sparseconvert a dense matrix to a sparse matrix
dice_distancedice similarity of words using n-grams
dims_of_word_vecsdimensions of a word vectors file
Doc2VecConversion of text documents to word-vector-representation...
JACCARD_DICEJaccard or Dice similarity for text documents
levenshtein_distancelevenshtein distance of two words
load_sparse_binaryload a sparse matrix in binary format
matrix_sparsitysparsity percentage of a sparse matrix
read_charactersread a specific number of characters from a text file
read_rowsread a specific number of rows from a text file
save_sparse_binarysave a sparse matrix in binary format
select_predictorsExclude highly correlated predictors
sparse_MeansRowMens and colMeans for a sparse matrix
sparse_SumsRowSums and colSums for a sparse matrix
sparse_term_matrixTerm matrices and statistics ( document-term-matrix,...
TEXT_DOC_DISSIMDissimilarity calculation of text documents
text_file_parsertext file parser
text_intersectintersection of words or letters in tokenized text
tokenize_transform_textString tokenization and transformation ( character string or...
tokenize_transform_vec_docsString tokenization and transformation ( vector of documents...
token_statstoken statistics
utf_localeutf-locale for the available languages
vocabulary_parserreturns the vocabulary counts for small or medium ( xml and...
mlampros/textTinyR documentation built on Jan. 17, 2024, 1:18 a.m.