texts: Texts for text mining and Machine Learning (data)
In poldham/kenlitr: Scientific Literature on Kenya

A dataset that combines the title and abstracts from the lens dataset for text mining and ML.

1	data("texts")

id: character
texts: character

The titles and abstracts are divided into separate files and then joined. A unique id is constructed from the paperid and row number separated by "_". The result is a data frame compliant with the emerging TIF format favoured by quanteda, spacyr etc. paperids can be reconstructed for joins at a later stage using tidyr::separate(). For use with spacy in Python use e.g. jsonlite::stream_out(texts, file("flights.jsonl")).

poldham/kenlitr documentation built on Nov. 5, 2019, 12:59 a.m.