sentences: Sentences for Machine Learning (data)

Description Usage Format Details

Description

A dataset of 11,124,944 sentences from the texts data for use in machine learning with libraries such as spaCy.

Usage

1
data("sentences")

Format

doc_id

character

text

character

places_root

logical

Details

The titles and abstracts from the texts dataset divided into sentences. For use with spaCy in Python convert places_root column to "answer" and replace TRUE with "accept" and FALSE with "reject". Then write to file with jsonlite::stream_out(sentences, file("sentences.jsonl")). The jsonl file can then be loaded directly into spaCy.


poldham/kenlitr documentation built on Nov. 5, 2019, 12:59 a.m.