jst_subset_ngrams | R Documentation |
This function helps in defining a subset of ngram files which should be imported, since importing all ngrams at once can be very expensive (in terms of cpu and memory).
jst_subset_ngrams(zip_archives, ngram_type, selection, by = file_name)
zip_archives |
A character vector of one or multiple zip-files. |
ngram_type |
One of |
selection |
A data.frame with the articles/books which are to be selected. |
by |
A column name for matching. |
A list of zip-locations which can be read via jst_get_ngram()
.
# create sample output
tmp <- tempdir()
jst_import_zip(jst_example("pseudo_dfr.zip"),
import_spec = jst_define_import(book = jst_get_book),
out_file = "test", out_path = tmp)
# re-import as our selection for which we would like to import ngrams
selection <- jst_re_import(file.path(tmp,
"test_book_chapter_jst_get_book-1.csv"))
# get location of file
zip_loc <- jst_subset_ngrams(jst_example("pseudo_dfr.zip"), "ngram1",
selection)
# import ngram
jst_get_ngram(zip_loc[[1]])
unlink(tmp)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.