jst_subset_ngrams: Define a subset of ngrams

View source: R/ngram.R

jst_subset_ngramsR Documentation

Define a subset of ngrams

Description

This function helps in defining a subset of ngram files which should be imported, since importing all ngrams at once can be very expensive (in terms of cpu and memory).

Usage

jst_subset_ngrams(zip_archives, ngram_type, selection, by = file_name)

Arguments

zip_archives

A character vector of one or multiple zip-files.

ngram_type

One of "ngram1", "ngram2" or "ngram3"

selection

A data.frame with the articles/books which are to be selected.

by

A column name for matching.

Value

A list of zip-locations which can be read via jst_get_ngram().

Examples

# create sample output
tmp <- tempdir()
jst_import_zip(jst_example("pseudo_dfr.zip"),
               import_spec = jst_define_import(book = jst_get_book),
               out_file = "test", out_path = tmp)

# re-import as our selection for which we would like to import ngrams
selection <- jst_re_import(file.path(tmp, 
                                     "test_book_chapter_jst_get_book-1.csv"))

# get location of file
zip_loc <- jst_subset_ngrams(jst_example("pseudo_dfr.zip"), "ngram1",
                             selection) 

# import ngram
jst_get_ngram(zip_loc[[1]])
unlink(tmp)

jstor documentation built on Aug. 16, 2023, 5:09 p.m.