as.ft_data: Coerce directory of papers to ft_data object
In ropensci/fulltext: Full Text of 'Scholarly' Articles Across Many Data Sources

as.ft_data

R Documentation

Coerce directory of papers to ft_data object

Description

create the same object that ft_get() outputs from your cached files - without having to run ft_get() again

Usage

as.ft_data(path = NULL)

Arguments

path

cache path. if not given, we use the default cache path. Default: NULL

Details

We use an internal store of identifiers to keep track of files. These identifiers are in the output of ft_get() and you can see them in that output. If a file does not have a matching entry in our index of files (e.g., if you drop a file into the cache location as in the example below), then we assign it an index based on the file path; we'd ideally use an article DOI or similar but we can not safely retrieve it with just a file path.

Value

an object of class ft_data

Examples

# put a file in the cache in case there aren't any
dir <- file.path(tempdir(), "testing")
dir.create(dir)
file <- system.file("examples", "elife.xml", package = "fulltext")
writeLines(readLines(file), tempfile(tmpdir = dir, fileext = ".xml"))

# call as.ft_data
x <- as.ft_data(path = dir)

# output lives underneath a special list index "cached" 
#   representing already present files
x$cached

## Not run: 
# collect chunks
if (requireNamespace("pubchunks")) {
  library(pubchunks)
  res <- ft_collect(x)
  pub_chunks(res, c("doi", "title")) %>% pub_tabularize()
}

## End(Not run)

ropensci/fulltext documentation built on Sept. 12, 2022, 7:57 a.m.