cord19_paragraphs: Full text of the papers in the CORD-19 dataset, separated...

Description Usage Format Source Examples

Description

Full text of the papers in one-observation-per-paragraph form. Includes only the ones in cord19_papers (thus, deduplicated and filtered).

Usage

1

Format

A tibble with variables:

paper_id

Unique identifier that can link to metadata and citations. SHA of the paper PDF.

paragraph

Index of the paragraph within the paper (1, 2, 3)

section

Section (e.g. Introduction, Results, Discussion). The casing is standardized to title case.

text

Full text

Source

https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

Examples

1
2
3
4
5
6
library(dplyr)
library(stringr)

# What are the most common section titles?
cord19_paragraphs %>%
  count(section = str_to_lower(section), sort = TRUE)

dgrtwo/cord19 documentation built on March 20, 2020, 12:44 a.m.