as.ft_data: Coerce directory of papers to ft_data object
In fulltext: Full Text of 'Scholarly' Articles Across Many Data Sources

Description Usage Arguments Details Value See Also Examples

create the same object that ft_get() outputs from your cached files - without having to run ft_get() again

1	as.ft_data(path = NULL)

path

cache path. if not given, we use the default cache path. Default: NULL

We use an internal store of identifiers to keep track of files. These identifiers are in the output of ft_get() and you can see them in that output. If a file does not have a matching entry in our index of files (e.g., if you drop a file into the cache location as in the example below), then we assign it an index based on the file path; we'd ideally use an article DOI or similar but we can not safely retrieve it with just a file path.

an object of class ft_data

ft_get()

# put a file in the cache in case there aren't any
dir <- file.path(tempdir(), "testing")
dir.create(dir)
file <- system.file("examples", "elife.xml", package = "fulltext")
writeLines(readLines(file), tempfile(tmpdir = dir, fileext = ".xml"))

# call as.ft_data
x <- as.ft_data(path = dir)

# output lives underneath a special list index "cached" 
#   representing already present files
x$cached

## Not run: 
# collect chunks
if (requireNamespace("pubchunks")) {
  library(pubchunks)
  res <- ft_collect(x)
  pub_chunks(res, c("doi", "title")) %>% pub_tabularize()
}

## End(Not run)