Text Processing for Small or Big Data Files

big_tokenize_transform | String tokenization and transformation for big data sets |

bytes_converter | bytes converter of a text file ( KB, MB or GB ) |

cluster_frequency | Frequencies of an existing cluster object |

cosine_distance | cosine distance of two character strings (each string... |

COS_TEXT | Cosine similarity for text documents |

Count_Rows | Number of rows of a file |

dense_2sparse | convert a dense matrix to a sparse matrix |

dice_distance | dice similarity of words using n-grams |

dims_of_word_vecs | dimensions of a word vectors file |

Doc2Vec | Conversion of text documents to word-vector-representation... |

JACCARD_DICE | Jaccard or Dice similarity for text documents |

levenshtein_distance | levenshtein distance of two words |

load_sparse_binary | load a sparse matrix in binary format |

matrix_sparsity | sparsity percentage of a sparse matrix |

read_characters | read a specific number of characters from a text file |

read_rows | read a specific number of rows from a text file |

save_sparse_binary | save a sparse matrix in binary format |

select_predictors | Exclude highly correlated predictors |

sparse_Means | RowMens and colMeans for a sparse matrix |

sparse_Sums | RowSums and colSums for a sparse matrix |

sparse_term_matrix | Term matrices and statistics ( document-term-matrix,... |

TEXT_DOC_DISSIM | Dissimilarity calculation of text documents |

text_file_parser | text file parser |

text_intersect | intersection of words or letters in tokenized text |

tokenize_transform_text | String tokenization and transformation ( character string or... |

tokenize_transform_vec_docs | String tokenization and transformation ( vector of documents... |

token_stats | token statistics |

utf_locale | utf-locale for the available languages |

vocabulary_parser | returns the vocabulary counts for small or medium ( xml and... |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.