rtika: rtika: R Interface to 'Apache Tika'

rtikaR Documentation

rtika: R Interface to 'Apache Tika'

Description

Extract text or metadata from over a thousand file types. Get either plain text or structured XHTML content.

Installing

If you have not done so already, finish installing rtika by typing in the R console:

install_tika()

Getting Started

The tika_text function will extract plain text from many types of documents. It is a good place to start. Please read the Vignette also. Other main functions include tika_xml and tika_html that get a structured XHMTL rendition. The tika_json function gets metadata as '.json', with XHMTL content.

The tika_json_text function gets metadata as '.json', with plain text content.

tika is the main function the others above inherit from.

Use tika_fetch to download files with a file extension matching the Content-Type.

Author(s)

Maintainer: Sasha Goodman goodmansasha@gmail.com

Authors:

  • The Apache Software Foundation [copyright holder]

Other contributors:

  • Julia Silge (Reviewed the package for rOpenSci, see https://github.com/ropensci/software-review/issues/191/) [reviewer]

  • David Gohel (Reviewed the package for rOpenSci, see https://github.com/ropensci/software-review/issues/191/) [reviewer]

See Also

Useful links:


rtika documentation built on Nov. 5, 2025, 5:27 p.m.